VM's inaccessible after live migration on certain Arista VXLAN Flood and Learn fabrics

Bug #1996995 reported by Aaron S
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Unassigned

Bug Description

Description
===========
This is not a Nova bug per se, but rather an issue with Arista and potentially other network fabrics.

I have observed a case where VMs are inaccessible by network traffic after live migrating on certain fabrics, in this case, Arista VXlan, despite the hypervisor sending out a number of garp packets following a live migration.

This was observed on an Arista VXlan fabric - live migrating a VM between hypervisors on two different switches. A live migration between two hypervisors on the same switch is not affected.

In both cases, I can see garps on the wire triggered by a VM being live migrated, these packets have been observed from other hypervisors and even other VMs in the same VLAN on different hypervisors.

The VM is accessible after a period of time, at the point the switch arp aging timer resets and the MAC is re-learnt on the correct switch.

This occurs on any VM - even a simple c1.m1 with no active workload, backed by Ceph storage.

Steps to Reproduce
===========

To try and prevent this from happening, I have tested the libvirt: Add announce-self post live-migration workaround patch[0] - despite this, the issue was still observed.

Create VM: c1.m1 or similar, Centos7 or Centos8 - Ceph storage, no active or significant load on VM

Run:
`ping VM_IP | while read ping; do echo "$(date): $pong"; done`

Then:
`openstack server migrate --live TARGET_HOST VM_INSTANCE`

Expected result
===============
VM live migrates and is accessible in a reasonable <10 timeframe

Actual result
=============
VM live migrates successfully, ping fails until switch arp timer resets (in our environment, 60-180 seconds)

Despite efforts from us and our network team, we are unable to determine why the VM is inaccessible, what has been noticed is that sending a further number of announce_self commands to the qemu monitor, triggering more garps, gets the VM into an accessible state in an acceptable time of <5 seconds.

Environment
=============
Arista EOS4.26M VXLan fabric
OpenStack Nova Train, Ussuri, Victoria (with and without patch
Ceph Nautlius

OpenStack provider networking, using VLANs

Patch/Workaround
=============
I have a follow-up workaround patch which builds on the announce-self patch prepared which we have been running in our production deployment.

This patch adds two configurable options and the associated code:

`enable_qemu_monitor_announce_max_retries` - this will call announce_self a futher n number of times, triggering more garp packets to be sent.

`enable_qemu_monitor_announce_retry_interval` - this is the delay which will be used between triggering the additional announce_self calls, as configured in the option above.

My tests of nearly 5000 live migrations show that the optimal settings in our environment are 3 additional calls to qemu_announce_self with 1 second delay - this gets out VMs accessible in 2 or 3 seconds in the vast majority of cases, and 99% within 5 seconds after they stop responding to ping (the point at which we determine they are inaccessible).

I shall be submitting this patch for review by the Nova community in the next few days.

0: https://opendev.org/openstack/nova/commit/9609ae0bab30675e184d1fc63aec849c1de020d0

Aaron S (as0)
description: updated
Aaron S (as0)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/867324

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/867324
Committed: https://opendev.org/openstack/nova/commit/fba851bf3a34562db9cdb783ae539556b8b7a329
Submitter: "Zuul (22348)"
Branch: master

commit fba851bf3a34562db9cdb783ae539556b8b7a329
Author: as0 <email address hidden>
Date: Tue Dec 13 09:43:38 2022 +0000

    Add further workaround features for qemu_monitor_announce_self

    In some cases on Arista VXLAN fabrics, VMs are inaccessible via network
    after live migration, despite garps being observed on the fabric itself.

    This patch builds on the feature
    ``[workarounds]/enable_qemu_monitor_announce_self`` feature as reported
    in `bug 1815989 <https://bugs.launchpad.net/nova/+bug/1815989>`

    This patch adds the ability to config the number of times the QEMU
    announce_self monitor command is called, and add a new configuration option to
    specify a delay between calling the announce_self command multiple times,
    as in some cases, multiple announce_self monitor commands are required for
    the fabric to honor the garp packets and the VM to become accessible via
    the network after live migration.

    Closes-Bug: #1996995
    Change-Id: I2f5bf7c9de621bb1dc7fae5b3374629a4fcc1f46

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 27.0.0.0rc1

This issue was fixed in the openstack/nova 27.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (unmaintained/yoga)

Fix proposed to branch: unmaintained/yoga
Review: https://review.opendev.org/c/openstack/nova/+/926630

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.