unable to live migrate instance after update to queens

Bug #1784579 reported by Lars
16
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Ocata
Fix Committed
Medium
Matt Riedemann
Pike
Fix Committed
Medium
Matt Riedemann
Queens
Fix Committed
Medium
Matt Riedemann
Rocky
Fix Committed
Medium
Matt Riedemann

Bug Description

Description
===========
After upgrade from pike to queens we're unable to live-migrate instances.

Steps to reproduce
==================
Live migrate an existing instance to an other compute node:
$ nova live-migration <instance-ID>

Expected result
===============
Instance should be moved to new compute node successfully.

Actual result
=============
On the source compute node there are nova exceptions raised[1] and the live-migration fails. In the nova database table instance_info_caches there are wrong network information[2] after the failed migration and a restart of the nova-compute service fails by the following exception[3]. To be able to start the nova-compute service again we have to update the database with the information before the failed live migration.

[1] Exceptions raised during live migration
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [req-c93d8481-b167-4582-915a-6ebab6990abd cd9715e9b4714bc6b4d77f15f12ba5a9 fa976f761aad4d378706dfc26ddf6004 - default default
] [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] Pre live migration failed at compute008: NovaException_Remote: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_bin
ding_failed'
Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
    res = self.dispatcher.dispatch(message)

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
    return self._do_dispatch(endpoint, method, ctxt, args)

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
    result = func(ctxt, **new_args)

  File "/usr/lib/python2.7/dist-packages/nova/exception_wrapper.py", line 76, in wrapped
    function_name, call_dict, binary)

  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()

  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/dist-packages/nova/exception_wrapper.py", line 67, in wrapped
    return f(self, context, *args, **kw)

  File "/usr/lib/python2.7/dist-packages/nova/compute/utils.py", line 976, in decorated_function
    return function(self, context, *args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 214, in decorated_function
    kwargs['instance'], e, sys.exc_info())

  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()

  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 202, in decorated_function
    return function(self, context, *args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5995, in pre_live_migration
    migrate_data)

  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 7618, in pre_live_migration
    self.plug_vifs(instance, network_info)

  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 833, in plug_vifs
    self.vif_driver.plug(instance, vif)

  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 767, in plug
    vif_obj = os_vif_util.nova_to_osvif_vif(vif)

  File "/usr/lib/python2.7/dist-packages/nova/network/os_vif_util.py", line 492, in nova_to_osvif_vif
    {'type': vif['type'], 'func': funcname})

NovaException: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_binding_failed'
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] Traceback (most recent call last):
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6049, in _do_live_migration
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] block_migration, disk, dest, migrate_data)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/compute/rpcapi.py", line 798, in pre_live_migration
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] disk=disk, migrate_data=migrate_data)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] retry=self.retry)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 131, in _send
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] timeout=timeout, retry=retry)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] retry=retry)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 550, in _send
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] raise result
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] NovaException_Remote: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_binding_failed'
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] Traceback (most recent call last):
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] res = self.dispatcher.dispatch(message)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] return self._do_dispatch(endpoint, method, ctxt, args)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] result = func(ctxt, **new_args)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/exception_wrapper.py", line 76, in wrapped
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] function_name, call_dict, binary)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] self.force_reraise()
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] six.reraise(self.type_, self.value, self.tb)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/exception_wrapper.py", line 67, in wrapped
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] return f(self, context, *args, **kw)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/compute/utils.py", line 976, in decorated_function
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] return function(self, context, *args, **kwargs)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 214, in decorated_function
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] kwargs['instance'], e, sys.exc_info())
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] self.force_reraise()
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] six.reraise(self.type_, self.value, self.tb)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 202, in decorated_function
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] return function(self, context, *args, **kwargs)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5995, in pre_live_migration
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] migrate_data)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 7618, in pre_live_migration
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] self.plug_vifs(instance, network_info)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 833, in plug_vifs
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] self.vif_driver.plug(instance, vif)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 767, in plug
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] vif_obj = os_vif_util.nova_to_osvif_vif(vif)
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] File "/usr/lib/python2.7/dist-packages/nova/network/os_vif_util.py", line 492, in nova_to_osvif_vif
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] {'type': vif['type'], 'func': funcname})
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2]
2018-07-31 09:04:38.353 9502 ERROR nova.compute.manager [instance: def90dd6-4aa9-46bd-adb4-28e46c6204e2] NovaException: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_binding_failed'

[2]
BEFORE live migration (Table nova.instance_info_caches)
| 2018-07-31 07:43:53 | 2018-07-31 08:19:24 | NULL | 8418 | [{"profile": {}, "ovs_interfaceid": "a877229d-5d1e-4b3f-8598-ff5a160f5320", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.1.80.48"}], "version": 4, "meta": {"dhcp_server": "10.1.80.13"}, "dns": [{"meta": {}, "version": 4, "type": "dns", "address": "10.1.10.52"}, {"meta": {}, "version": 4, "type": "dns", "address": "10.1.10.53"}], "routes": [], "cidr": "10.1.80.0/20", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.1.80.1"}}], "meta": {"injected": false, "tenant_id": "fa976f761aad4d378706dfc26ddf6004", "mtu": 1500}, "id": "43e1e8fa-a92b-4d9e-8f70-384090a2beb7", "label": "vlan-managed-light"}, "devname": "tapa877229d-5d", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:ff:cf:a3", "active": true, "type": "ovs", "id": "a877229d-5d1e-4b3f-8598-ff5a160f5320", "qbg_params": null}] | cbbd5e5b-4c5b-41f5-9506-ea8548c09e6c | 0 |

AFTER failed live migration (Table nova.instance_info_caches)
| 2018-07-31 07:43:53 | 2018-07-31 08:19:24 | NULL | 8418 | [{"profile": {}, "ovs_interfaceid": null, "preserve_on_delete": false, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.1.80.48"}], "version": 4, "meta": {"dhcp_server": "10.1.80.13"}, "dns": [{"meta": {}, "version": 4, "type": "dns", "address": "10.1.10.52"}, {"meta": {}, "version": 4, "type": "dns", "address": "10.1.10.53"}], "routes": [], "cidr": "10.1.80.0/20", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.1.80.1"}}], "meta": {"injected": false, "tenant_id": "fa976f761aad4d378706dfc26ddf6004", "mtu": 1500}, "id": "43e1e8fa-a92b-4d9e-8f70-384090a2beb7", "label": "vlan-managed-light"}, "devname": "tapa877229d-5d", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {}, "address": "fa:16:3e:ff:cf:a3", "active": false, "type": "binding_failed", "id": "a877229d-5d1e-4b3f-8598-ff5a160f5320", "qbg_params": null}] | cbbd5e5b-4c5b-41f5-9506-ea8548c09e6c | 0 |

[3] Exceptions raised on nova-compute startup
2018-07-31 10:26:28.347 20605 DEBUG nova.network.os_vif_util [req-0e1d6a7a-b07c-488e-80dd-9548aa7f84fb - - - - -] Converting VIF {"profile": {}, "ovs_interfaceid": null, "preserve_on_delete": false, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.1.80.48"}], "version": 4, "meta": {"dhcp_server": "10.1.80.13"}, "dns": [{"meta": {}, "version": 4, "type": "dns", "address": "10.1.10.52"}, {"meta": {}, "version": 4, "type": "dns", "address": "10.1.10.53"}], "routes": [], "cidr": "10.1.80.0/20", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.1.80.1"}}], "meta": {"injected": false, "tenant_id": "fa976f761aad4d378706dfc26ddf6004", "mtu": 1500}, "id": "43e1e8fa-a92b-4d9e-8f70-384090a2beb7", "label": "vlan-managed-light"}, "devname": "tapa877229d-5d", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {}, "address": "fa:16:3e:ff:cf:a3", "active": false, "type": "binding_failed", "id": "a877229d-5d1e-4b3f-8598-ff5a160f5320", "qbg_params": null} nova_to_osvif_vif /usr/lib/python2.7/dist-packages/nova/network/os_vif_util.py:484
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service [req-0e1d6a7a-b07c-488e-80dd-9548aa7f84fb - - - - -] Error starting thread.: NovaException: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_binding_failed'
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service Traceback (most recent call last):
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 729, in run_service
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service service.start()
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/service.py", line 161, in start
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service self.manager.init_host()
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1153, in init_host
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service self._init_instance(context, instance)
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 950, in _init_instance
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service self.driver.plug_vifs(instance, net_info)
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 833, in plug_vifs
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service self.vif_driver.plug(instance, vif)
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 767, in plug
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service vif_obj = os_vif_util.nova_to_osvif_vif(vif)
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/nova/network/os_vif_util.py", line 492, in nova_to_osvif_vif
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service {'type': vif['type'], 'func': funcname})
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service NovaException: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_binding_failed'
2018-07-31 10:26:28.360 20605 ERROR oslo_service.service

Environment
===========
Compute node (dpkg -l | grep nova)
ii nova-common 2:17.0.5-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-compute 2:17.0.5-0ubuntu1~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:17.0.5-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:17.0.5-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python-nova 2:17.0.5-0ubuntu1~cloud0 all OpenStack Compute Python libraries
ii python-novaclient 2:9.1.1-0ubuntu1~cloud0 all client library for OpenStack Compute API - Python 2.7

Hypervisor
ii nova-compute-kvm 2:17.0.5-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii qemu-kvm 1:2.11+dfsg-1ubuntu7.4~cloud0 amd64 QEMU Full virtualization on x86 hardware

ii libvirt-bin 4.0.0-1ubuntu8.3~cloud0 amd64 programs for the libvirt library
ii libvirt-clients 4.0.0-1ubuntu8.3~cloud0 amd64 Programs for the libvirt library
ii libvirt-daemon 4.0.0-1ubuntu8.3~cloud0 amd64 Virtualization daemon
ii libvirt-daemon-driver-storage-rbd 4.0.0-1ubuntu8.3~cloud0 amd64 Virtualization daemon RBD storage driver
ii libvirt-daemon-system 4.0.0-1ubuntu8.3~cloud0 amd64 Libvirt daemon configuration files
ii libvirt0:amd64 4.0.0-1ubuntu8.3~cloud0 amd64 library for interfacing with different virtualization systems

Revision history for this message
Dan Smith (danms) wrote :

I think you want to get/post neutron logs for this. I think the "binding failed" is coming back from neutron and we're naively using that as the vif_type.

tags: added: live-migration
Revision history for this message
Matt Riedemann (mriedem) wrote :

Well, clearly at some point vif binding failed and that's why you now have a "binding_failed" vif type in the info cache for the instance. Did you check the neutron agent logs on the source and/or dest hosts to see why binding failed, because that's the root issue. I've also seen cases where the vif type is "unbound". It seems nova should probably *not* store that "binding_failed" vif type in the info cache if there is another value already in the cache, but that might not be trivial to determine based on how the code is structured (we could always store off the vif types in the cache *before* getting the latest information from neutron and then do that comparison afterward and filter out any vif type changes from something like "ovs" to "binding_failed" since we know that won't work if we try to plug/unplug the vif).

To summarize, it looks like the pre_live_migration method on the destination host fails to plug vifs and you end up with the "binding_failed" error, which is raised and makes the source live_migration method fail as expected. The failure is on the dest host. As a result, the info cache is updated with "binding_failed" which causes the source compute restart to fail here:

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L958

Note that we're already handling VirtualInterfacePlugException but not the more generic:

"NovaException: Unsupported VIF type binding_failed convert '_nova_to_osvif_vif_binding_failed'"

We should (1) fix the _init_instance logic to also handle that error so we don't fail to start the compute and (2) then you should be able to reboot the instance to fix the networking - also investigate the vif plugging failures on the destination host (compute008?).

Revision history for this message
sean mooney (sean-k-mooney) wrote :

@dan
binding failed is an actual semi-valid vif type in neutron
https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/portbindings.py#L86
i say semi valid as it represent the fact that neutron was not able to find an ml2 capable of configuring the newtorking on the host for that interface.

we would need both the q-agt log and the q-svc log to determine why the binding failed in neutron.
the actull error shoudl be in the neutron server log (q-svc) but it could be a reuslt of an error in the neutron ovs l2 agent (q-agt). for example if the q-agt was stopped the binding would fail for that host.

i would also guess we are missing binding the port back to the souce in the cases where the prelive migration fails to bind it to the dest node.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/587498

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

We don't actually do any port binding on the destination host (in queens) during pre_live_migration, so I'm wondering if the instance info_cache already had the bad "binding_failed" vif type in it before you started the live migration, because we don't actually change the ports host_id to the dest host until we get to _post_live_migration, so I'm guessing this instance was already messed up before you attempted to live migrate it. Do you have any other errors in the logs for this instance and/or port *before* the live migration attempt?

Revision history for this message
Lars (l4rs) wrote :

Unfortunately there are no previous errors in the logfile. It looks like something bad happened during the upgrade. Because we had some trouble with our rabbitmq and duplicate messages. So maybe the live migration request has been cached and after fixing the duplicate message problem (we had to reinitialize our rabbitmq cluster) the instance cache info was still wrong (state: migrating to). The only way to solve that errors above (NovaException_Remote: Unsupported VIF type ...) was to remove that wrongly cached port from the instance. With newly created instances there was no problem. What I don't understand is where does this information come from? Because I updated the wrongly cached information in the database with information before the upgrade. And as soon as I started a new action (e.g. reboot) the instance cache information had the state: migrating to. So where does this come from?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/587498
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cdf8ba5acb7f65042af9c21fcbe1a126bd857ad0
Submitter: Zuul
Branch: master

commit cdf8ba5acb7f65042af9c21fcbe1a126bd857ad0
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 31 11:20:47 2018 -0400

    Handle binding_failed vif plug errors on compute restart

    Like change Ia584dba66affb86787e3069df19bd17b89cb5c49 which
    came before, is port binding fails and we have a "binding_failed"
    vif type in the info cache, we'll fail to plug vifs for an
    instance on compute restart which will prevent the service
    from restarting. Before the os-vif conversion code, this was
    handled with VirtualInterfacePlugException but the os-vif conversion
    code fails in a different way by raising a generic NovaException
    because the os-vif conversion utility doesn't handle a vif_type of
    "binding_failed".

    To resolve this and make the os-vif flow for binding_failed behave
    the same as the legacy path, we implement a translation function
    in os_vif_util for binding_failed which will make the plug_vifs
    code raise VirtualInterfacePlugException which is what the
    _init_instance code in ComputeManager is already handling.

    Admittedly this isn't the smartest thing and doesn't attempt
    to recover / fix the instance networking info, but it at least
    gives a more clear indication of what's wrong and lets the
    nova-compute service start up. A note is left in the
    _init_instance error handling that we could potentially try
    to heal binding_failed vifs in _heal_instance_info_cache, but
    that would need to be done in a separate change since it's more
    invasive.

    Change-Id: Ia963a093a1b26d90b4de2e8fc623031cf175aece
    Closes-Bug: #1784579

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/595317

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/595317
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a890e3d624a84d8eb0306fab580e2cec33e26bc3
Submitter: Zuul
Branch: stable/rocky

commit a890e3d624a84d8eb0306fab580e2cec33e26bc3
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 31 11:20:47 2018 -0400

    Handle binding_failed vif plug errors on compute restart

    Like change Ia584dba66affb86787e3069df19bd17b89cb5c49 which
    came before, is port binding fails and we have a "binding_failed"
    vif type in the info cache, we'll fail to plug vifs for an
    instance on compute restart which will prevent the service
    from restarting. Before the os-vif conversion code, this was
    handled with VirtualInterfacePlugException but the os-vif conversion
    code fails in a different way by raising a generic NovaException
    because the os-vif conversion utility doesn't handle a vif_type of
    "binding_failed".

    To resolve this and make the os-vif flow for binding_failed behave
    the same as the legacy path, we implement a translation function
    in os_vif_util for binding_failed which will make the plug_vifs
    code raise VirtualInterfacePlugException which is what the
    _init_instance code in ComputeManager is already handling.

    Admittedly this isn't the smartest thing and doesn't attempt
    to recover / fix the instance networking info, but it at least
    gives a more clear indication of what's wrong and lets the
    nova-compute service start up. A note is left in the
    _init_instance error handling that we could potentially try
    to heal binding_failed vifs in _heal_instance_info_cache, but
    that would need to be done in a separate change since it's more
    invasive.

    Change-Id: Ia963a093a1b26d90b4de2e8fc623031cf175aece
    Closes-Bug: #1784579
    (cherry picked from commit cdf8ba5acb7f65042af9c21fcbe1a126bd857ad0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.1

This issue was fixed in the openstack/nova 18.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/626218

summary: - unable to live migrate instance after update to queens
+ Unsupported VIF type unbound convert '_nova_to_osvif_vif_binding_failed'
+ on compute restart
summary: - Unsupported VIF type unbound convert '_nova_to_osvif_vif_binding_failed'
- on compute restart
+ unable to live migrate instance after update to queens
Revision history for this message
Matt Riedemann (mriedem) wrote :

The regression goes back to newton: https://review.openstack.org/#/c/350595/

Revision history for this message
Matt Riedemann (mriedem) wrote :

Note that https://review.openstack.org/#/c/587498/ sort of incorrectly says it fixes this bug, which it doesn't really, it fixes a symptom of this bug which is that after the failed port binding during live migration, restarting the source compute fails (it's more aligned bug 1738373).

The port binding failure could have been due to the neutron agent being down on the destination host during live migration, or maybe that host was out of fixed IPs, something like that.

The root issue is nova saving off the binding_failed vif_type in the instance info_cache which led to the failure to restart nova-compute later.

I suspect the binding_failed data is getting put into the instance when the source compute gets a network-changed event from neutron after the port binding failure which changed the vif_type on the port and then nova saves that change into the info_cache.

There are a couple of related fixes for that binding_failed info_cache value:

1. https://review.openstack.org/#/c/603844/ - that would be a manual recovery action to try and reboot/rebuild the instance to force a re-binding of the port on the original host and fix the binding failure.

2. https://review.openstack.org/#/c/591607/ - that would force the info_cache to be refreshed periodically from the actual current state of the port in neutron, rather than what is in the info_cache and could be wrong/out of date if the port binding was later fixed once the neutron agent was brought back online?

--

Alternatively, nova could ignore binding_failed vif_type changes during network-changed events, but that might lead to weird side effects if nova's version of the port state (in the info_cache) is different from the actual state in neutron.

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :
Download full text (3.8 KiB)

After discussions with mriedem on IRC, it's worth noting that the above patch doesn't fix the underlying issue so much as a side-effect of that, namely, the inability of nova-compute to restart after the error has occurred. What is does fix is bug #1738373, which is solely focused on that side effect. That bug has now been marked as a duplicate of this one.

Cleaned up logs from the IRC discussion on #nova-compute below.

[19-12 15:12:52] <stephenfin> mriedem: Not to distract you now, but did you make a mistake on https://github.com/openstack/nova/commit/cdf8ba5acb ? You've said it fixes https://bugs.launchpad.net/nova/+bug/1784579 but that bug is for live migration, not compute service restart which is what your commit addresses
[19-12 15:13:18] <stephenfin> mriedem: I ask because I found a similar bug which does deal with the compute service restart https://bugs.launchpad.net/nova/+bug/1738373
[19-12 15:18:06] <mriedem> stephenfin: yes bug 1784579 is about os-vif port binding failed errors right?
[19-12 15:19:14] <stephenfin> mriedem: Yup, but it's to do with live migration and the fix is only for the service startup code path
[19-12 15:19:40] <stephenfin> At least, assuming I'm reading it right. I'll do some digging but just wanted to sanity check it before I dived down the rabbit hole :)
[19-12 15:19:56] <mriedem> stephenfin: the live migratoin fails because of the port binding failures
[19-12 15:21:15] <mriedem> stephenfin: comment
[19-12 15:21:16] <mriedem> 2
[19-12 15:21:17] <mriedem> "To summarize, it looks like the pre_live_migration method on the destination host fails to plug vifs and you end up with the "binding_failed" error, which is raised and makes the source live_migration method fail as expected. The failure is on the dest host. As a result, the info cache is updated with "binding_failed" which causes the source compute restart to fail here:"
[19-12 15:22:19] <mriedem> stephenfin: so no i didn't fix the original reason for the port binding failure in pre_live_migration, because that could have been for any number of reasons (neutron agent was down on the dest host?)
[19-12 15:22:38] <mriedem> i fixed a symptom of that failure, which was nova-compute failed to restart after that failure
[19-12 15:22:53] <mriedem> as the commit message says, "Admittedly this isn't the smartest thing and doesn't attempt
[19-12 15:22:54] <mriedem>     to recover / fix the instance networking info"
[19-12 15:22:59] <stephenfin> mriedem: I'm missing something. Why make changes to 'ComputeManager.init_host' (via '_init_instance') in that commit? The exception was being seen in the live migration flow
[19-12 15:23:01] <stephenfin> ahhhhh
[19-12 15:23:21] <mriedem> 1. live migratoin fails, port binding failed - that gets saved in the info cache
[19-12 15:23:31] <mriedem> 2. restart source compute - that blows up because it wasn't handling binding_failed vif types in the os-vif conversion code
[19-12 15:23:38] <mriedem> i handle #2
[19-12 15:23:46] <mriedem> #1 is sort of out of my control
[19-12 15:23:50] <stephenfin> Your fix would inadvertently resolve https://bugs.launchpad.net/nova/+bug/1738373 so
[19-12 15:24:12] <mriedem> i mean, we probab...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/626228

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/626361

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/626369

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/626228
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1def76a1c49032d93ab6c7ee61dbbfe8e29cafca
Submitter: Zuul
Branch: master

commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/626410

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/626410
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bc0a5d0355311641daa87b46e311ae101f1817ad
Submitter: Zuul
Branch: stable/rocky

commit bc0a5d0355311641daa87b46e311ae101f1817ad
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136
    (cherry picked from commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/626550

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/626554

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/626556

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/626218
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4827cedbc56033c2ac3caf0d7998fca6aff997d6
Submitter: Zuul
Branch: stable/queens

commit 4827cedbc56033c2ac3caf0d7998fca6aff997d6
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 31 11:20:47 2018 -0400

    Handle binding_failed vif plug errors on compute restart

    Like change Ia584dba66affb86787e3069df19bd17b89cb5c49 which
    came before, is port binding fails and we have a "binding_failed"
    vif type in the info cache, we'll fail to plug vifs for an
    instance on compute restart which will prevent the service
    from restarting. Before the os-vif conversion code, this was
    handled with VirtualInterfacePlugException but the os-vif conversion
    code fails in a different way by raising a generic NovaException
    because the os-vif conversion utility doesn't handle a vif_type of
    "binding_failed".

    To resolve this and make the os-vif flow for binding_failed behave
    the same as the legacy path, we implement a translation function
    in os_vif_util for binding_failed which will make the plug_vifs
    code raise VirtualInterfacePlugException which is what the
    _init_instance code in ComputeManager is already handling.

    Admittedly this isn't the smartest thing and doesn't attempt
    to recover / fix the instance networking info, but it at least
    gives a more clear indication of what's wrong and lets the
    nova-compute service start up. A note is left in the
    _init_instance error handling that we could potentially try
    to heal binding_failed vifs in _heal_instance_info_cache, but
    that would need to be done in a separate change since it's more
    invasive.

    Change-Id: Ia963a093a1b26d90b4de2e8fc623031cf175aece
    Closes-Bug: #1784579
    (cherry picked from commit cdf8ba5acb7f65042af9c21fcbe1a126bd857ad0)
    (cherry picked from commit a890e3d624a84d8eb0306fab580e2cec33e26bc3)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/626550
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=79a90d37027b7ca131218e16eaee70d6d5152206
Submitter: Zuul
Branch: stable/queens

commit 79a90d37027b7ca131218e16eaee70d6d5152206
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136
    (cherry picked from commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca)
    (cherry picked from commit bc0a5d0355311641daa87b46e311ae101f1817ad)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/626361
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=254a19f0d326ae5d3b5890d0d5fc735a771fcc0b
Submitter: Zuul
Branch: stable/pike

commit 254a19f0d326ae5d3b5890d0d5fc735a771fcc0b
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 31 11:20:47 2018 -0400

    Handle binding_failed vif plug errors on compute restart

    Like change Ia584dba66affb86787e3069df19bd17b89cb5c49 which
    came before, is port binding fails and we have a "binding_failed"
    vif type in the info cache, we'll fail to plug vifs for an
    instance on compute restart which will prevent the service
    from restarting. Before the os-vif conversion code, this was
    handled with VirtualInterfacePlugException but the os-vif conversion
    code fails in a different way by raising a generic NovaException
    because the os-vif conversion utility doesn't handle a vif_type of
    "binding_failed".

    To resolve this and make the os-vif flow for binding_failed behave
    the same as the legacy path, we implement a translation function
    in os_vif_util for binding_failed which will make the plug_vifs
    code raise VirtualInterfacePlugException which is what the
    _init_instance code in ComputeManager is already handling.

    Admittedly this isn't the smartest thing and doesn't attempt
    to recover / fix the instance networking info, but it at least
    gives a more clear indication of what's wrong and lets the
    nova-compute service start up. A note is left in the
    _init_instance error handling that we could potentially try
    to heal binding_failed vifs in _heal_instance_info_cache, but
    that would need to be done in a separate change since it's more
    invasive.

    Change-Id: Ia963a093a1b26d90b4de2e8fc623031cf175aece
    Closes-Bug: #1784579
    (cherry picked from commit cdf8ba5acb7f65042af9c21fcbe1a126bd857ad0)
    (cherry picked from commit a890e3d624a84d8eb0306fab580e2cec33e26bc3)
    (cherry picked from commit 4827cedbc56033c2ac3caf0d7998fca6aff997d6)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/626554
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7b4f5725f821ef89176ef69f036471eaaf8a6201
Submitter: Zuul
Branch: stable/pike

commit 7b4f5725f821ef89176ef69f036471eaaf8a6201
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136
    (cherry picked from commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca)
    (cherry picked from commit bc0a5d0355311641daa87b46e311ae101f1817ad)
    (cherry picked from commit 79a90d37027b7ca131218e16eaee70d6d5152206)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/603844
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=542635034882e1b6897e1935f09d6feb6e77d1ce
Submitter: Zuul
Branch: master

commit 542635034882e1b6897e1935f09d6feb6e77d1ce
Author: Jack Ding <email address hidden>
Date: Wed Sep 19 11:54:44 2018 -0400

    Correct instance port binding for rebuilds

    The following 2 scenarios could result in an instance with incorrect
    port binding and cause subsequent rebuilds to fail.

    If an evacuation of an instance fails part way through, after the point
    where we reassign the port binding to the new host but before we change
    the instance host, we end up with the ports assigned to the wrong host.
    This change adds a check to determine if there's any port binding host
    mismatches and if so trigger setup of instance network.

    During recovery of failed hosts, neutron could get overwhelmed and lose
    messages, for example when active controller was powered-off in the
    middle of instance evacuations. In this case the vif_type was set to
    'binding_failed' or 'unbound'. We subsequently hit "Unsupported VIF
    type" exception during instance hard_reboot or rebuild, leaving the
    instance unrecoverable.

    This commit changes _heal_instance_info_cache periodic task to update
    port binding if evacuation fails due to above errors so that the
    instance can be recovered later.

    Closes-Bug: #1659062
    Related-Bug: #1784579

    Co-Authored-By: Gerry Kopec <email address hidden>
    Co-Authored-By: Jim Gauld <email address hidden>
    Change-Id: I75fd15ac2a29e420c09499f2c41d11259ca811ae
    Signed-off-by: Jack Ding <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.10

This issue was fixed in the openstack/nova 17.0.10 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/626369
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d6491167a4249bd6d57b8ba3597c021332ee7420
Submitter: Zuul
Branch: stable/ocata

commit d6491167a4249bd6d57b8ba3597c021332ee7420
Author: Matt Riedemann <email address hidden>
Date: Tue Jul 31 11:20:47 2018 -0400

    Handle binding_failed vif plug errors on compute restart

    Like change Ia584dba66affb86787e3069df19bd17b89cb5c49 which
    came before, is port binding fails and we have a "binding_failed"
    vif type in the info cache, we'll fail to plug vifs for an
    instance on compute restart which will prevent the service
    from restarting. Before the os-vif conversion code, this was
    handled with VirtualInterfacePlugException but the os-vif conversion
    code fails in a different way by raising a generic NovaException
    because the os-vif conversion utility doesn't handle a vif_type of
    "binding_failed".

    To resolve this and make the os-vif flow for binding_failed behave
    the same as the legacy path, we implement a translation function
    in os_vif_util for binding_failed which will make the plug_vifs
    code raise VirtualInterfacePlugException which is what the
    _init_instance code in ComputeManager is already handling.

    Admittedly this isn't the smartest thing and doesn't attempt
    to recover / fix the instance networking info, but it at least
    gives a more clear indication of what's wrong and lets the
    nova-compute service start up. A note is left in the
    _init_instance error handling that we could potentially try
    to heal binding_failed vifs in _heal_instance_info_cache, but
    that would need to be done in a separate change since it's more
    invasive.

    Conflicts:
          nova/compute/manager.py
          nova/tests/unit/network/test_os_vif_util.py

    NOTE(mriedem): The compute manager conflicts are due to change
    I2740ea14e0c4ecee0d91c7f3e401b2c29498d097 in Queens. The _LE()
    marker has to be left intact for pep8 checks in Ocata. The
    test_os_vif_util conflicts are due to not having change
    Ic23effc05c901575f608f2b4c5ccd2b1fb3c2d5a nor change
    I3f38954bc5cf7b1690182dc8af45078eea275aa4 in Ocata.

    Change-Id: Ia963a093a1b26d90b4de2e8fc623031cf175aece
    Closes-Bug: #1784579
    (cherry picked from commit cdf8ba5acb7f65042af9c21fcbe1a126bd857ad0)
    (cherry picked from commit a890e3d624a84d8eb0306fab580e2cec33e26bc3)
    (cherry picked from commit 4827cedbc56033c2ac3caf0d7998fca6aff997d6)
    (cherry picked from commit 254a19f0d326ae5d3b5890d0d5fc735a771fcc0b)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/626556
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e61b1d7d72470a95068470d67779e08ececdb2e5
Submitter: Zuul
Branch: stable/ocata

commit e61b1d7d72470a95068470d67779e08ececdb2e5
Author: Stephen Finucane <email address hidden>
Date: Wed Dec 19 16:03:22 2018 +0000

    Handle unbound vif plug errors on compute restart

    As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can
    sometimes cache failed port binding information which we'll see on
    startup. Long term, the fix for both issues is to figure out how this is
    being cached and stop that happening but for now we simply need to allow
    the service to start up.

    To this end, we copy the approach in the aforementioned change and
    implement a translation function in os_vif_util for unbound which
    will make the plug_vifs code raise VirtualInterfacePlugException which
    is what the _init_instance code in ComputeManager is already handling.

    This has the same caveats as that change, namely that there may be
    smarter ways to do this that we should explore. However, that change
    also included a note which goes someway to explaining this.

    Conflicts:
     nova/compute/manager.py
     nova/tests/unit/network/test_os_vif_util.py

    NOTE(sfinucan): As with the 'stable/ocata' backport of change
    Ia963a093a1b26d90b4de2e8fc623031cf175aece, the compute manager conflicts
    are due to change I2740ea14e0c4ecee0d91c7f3e401b2c29498d097 in Queens.
    The _LE() marker has to be left intact for pep8 checks in Ocata. The
    test_os_vif_util conflicts are due to not having change
    Ic23effc05c901575f608f2b4c5ccd2b1fb3c2d5a nor change
    I3f38954bc5cf7b1690182dc8af45078eea275aa4 in Ocata

    Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-bug: #1784579
    Closes-bug: #1809136
    (cherry picked from commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca)
    (cherry picked from commit bc0a5d0355311641daa87b46e311ae101f1817ad)
    (cherry picked from commit 79a90d37027b7ca131218e16eaee70d6d5152206)
    (cherry picked from commit 7b4f5725f821ef89176ef69f036471eaaf8a6201)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.8

This issue was fixed in the openstack/nova 16.1.8 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova ocata-eol

This issue was fixed in the openstack/nova ocata-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.