Comment 0 for bug 2069718

Revision history for this message
Stefan Hoffmann (shoffmann) wrote :

Problem: In environments with many hypervisors and VMs, a live-migration leads to VMs being not reachable for some seconds (4-20s).

Description:
We run a big environment with many hypervisors and VMs, so northd reconcile cycles take some time.
At live-migration, even nova has live_migration_wait_for_vif_plug=true configured, the vif-plugged event from neutron is send before northd has processed the change to have the VMs port added to the destination hypervisor and multi-chassis-feature is enabled.
Nova starts the live migration at libvirt and it is done, before southbound and ovn-controller of destination have the change.
So the VM is started at destination hypervisor but the port setup is not done yet.

From what I saw, the vif-plugged event is generated by neutron, when the transaction to northbound ovsdb is finished [1].

Is there a way to wait till the change is propagated to southbound ovsdb?

Version:
neutron-server 21.2.1 zed / unmaintained/zed
ml2 plugin: ovn
at neutron: ovsdb-client (Open vSwitch) 3.3.0
Nova zed / unmaintained/zed
nova.conf: live_migration_wait_for_vif_plug=true (https://docs.openstack.org/nova/latest/configuration/config.html#compute.live_migration_wait_for_vif_plug)
Hypervisor OS: Ubuntu 22.04 with newer kernel (but that shouldn't be relevant here)

Steps to Reproduce:

1. Run neutron with ovn setup and create a VM that you can ping (via FIP or other VM in same private network)
2. Stop northd
3. Start live-migration
4. Wait till live-migration is done - VM is not reachable anymore

[1] https://opendev.org/openstack/neutron/src/branch/unmaintained/zed/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L836