Comment 15 for bug 1815989

Revision history for this message
yao ning (mslovy11022) wrote :

Hi, sean

the root cause of this issue seams related to the neutron new port binding api. see: https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/neutron-new-port-binding-api.html

in rocky release, live migration activate the port by using port binding api in post_live_migration process. However, the libvirt start new instances and the new instance become alive before the port comes to active, and so the RARP packets are finally lost.

we are verified this by revert the port binding api logical, and then the problem solved. like below:
in neutron/agent/rpc.py
class CacheBackedPluginApi(PluginApi):

        if (port_obj.device_owner.startswith(
315 constants.DEVICE_OWNER_COMPUTE_PREFIX) and
316 binding[pb_ext.HOST] != host):
317 LOG.debug("Device %s has no active binding in this host",
318 port_obj)
319 return {'device': device,
320 n_const.NO_ACTIVE_BINDING: True}

skip this if branch, so that the port is always in active state.

also we need to skip port binding api used by nova:
in nova/network/neutronv2/api.py
    def supports_port_binding_extension(self, context):
        """This is a simple check to see if the neutron "binding-extended"
        extension exists and is enabled.

        The "binding-extended" extension allows nova to bind a port to multiple
        hosts at the same time, like during live migration.

        :param context: the user request context
        :returns: True if the binding-extended API extension is available,
                  False otherwise
        """
        self._refresh_neutron_extensions_cache(context)
        return constants.PORT_BINDING_EXTENDED in self.extensions

we directly return false for supports_port_binding_extension, so nova will not call port binding api during live migration. The legacy way is used

we confirm that because we manually call the activate port binding api to activate the port on destination during migration before the vms actived on destination, then the problem is also dispeared.

since the neutron port binding api has its own advantages, so how can we solve it thoroughly. Is that possible to activate the port binding before the vm shutting down on the source host and vm being running on the destination host? @sean mooney