when live migration fails due to a internal error rollback is not handeled correctly.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Matt Riedemann | ||
Rocky |
In Progress
|
Medium
|
Lee Yarwood | ||
Stein |
Fix Committed
|
Medium
|
Matt Riedemann | ||
Train |
Fix Committed
|
Medium
|
Matt Riedemann |
Bug Description
Description
===========
While testing livemigration between ovs and ovs-dpdk on centos 7.5 i encountered
a qemu internal error the resulted in the qemu monitor closeing.
when the migration failed during the virDomainMigrat
rolled back the migration leaving the vm running on the souce node
the vm however no longer had network connectivity because the vm port
on the source node had no vif type as the host-id in the port was no long set.
this is likely due to not reactivating the souce node binding when the migration fails.
if the vm is hard-rebooted in this state it enters teh error state as the vif bingins are broken.
note the qemu error is out of scope of this bug. this bug just focuses on the fact
that nova does not correctly roll back the migration and result in the vm still running
correctly with networking on the source node when a qemu error is thrown.
Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
deploy a two node devstack with ovs on the controler and ovs-dpdk on the second node.
configure nova compute with libvirt/kvm virt type on both nodes.
alloate hugepages on both nodes and boot a vm on either node.
attempt to live migrate the vm to the other node.
Expected result
===============
if qemu has a bug that results in migration failure
the vif bindings should be resotred to there premigration
state and if networkig cannot be resore the vm should go directly
to error state.
Actual result
=============
the vm remained running on the source node with state active.
the vm interface no longer had a host_id set in the neutron vif:binding-details
and the vif type was none.
on hard reboot the vm entered error state with the following message
Error: Failed to perform requested operation on instance "dpdk", the instance has an error status: Please try again later [Error: vif_type parameter must be present for this vif_driver implementation].
Environment
===========
1. openstack rocky RC1
nova sha: afe4512bf66c89a
neutron sha: 1dda2bca862b126
2. Which hypervisor did you use?
libvirt + kvm
libvirtd (libvirt) 3.9.0
QEMU emulator version 2.10.0(
centos 7.5
2. Which storage type did you use?
boot from volume using default lvm backed deploy by devstack.
3. Which networking type did you use?
neutron ml2/ovs with ovs and ovs-dpdk
this is not really relevent to the bug.
Logs & Configs
==============
n-cpu log
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Aug 16 12:19:00 devstack5 nova-compute[
Changed in nova: | |
importance: | Undecided → Medium |
tags: | added: rocky-rc-potential |
tags: | removed: rocky-rc-potential |
What clears out the vif binding host_id and vif_type during the live migration? I would expect those to always be set. Is it something to do with rolling back (deleting) the destination host port binding but not activating the source host port binding?
https:/ /github. com/openstack/ nova/blob/ 722d5b477219f0a 2435a9f4ad4d54c 61b83219f1/ nova/compute/ manager. py#L6908