Comment 0 for bug 1944619

Revision history for this message
Erlon R. Cruz (sombrafam) wrote : Instances with SRIOV ports loose access after failed live migrations

If for some reason a live migration fails for an instance with an SRIOV port
during the '_pre_live_migration' hook. The instance will lose access to the
network and leave behind duplicated port bindings on the database.

The instance re-gains connectivity on the source host after a reboot (don't
know if there's another way to restore connectivity). As a side effect of this
behavior, the pre-live migration cleanup hook also fails with:

PCI device 0000:3b:10.0 is in use by driver QEMU

[How to reproduce]

Create an environment with SRIOV, (our case uses switchdev[1])
Create 1 VM
Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
Check the VM's connectivity
Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
Full-stack trace[2]

[Expected]

VM connectivity is restored even if it gets a brief disconnection

[Observed]
VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled

[1] https://paste.ubuntu.com/p/PzBM7y6Dbr/
[2] https://paste.ubuntu.com/p/ThQmDYtdSS/