[RFE][OVN] Create an intermediate OVS bridge between VM and intergration bridge to improve the live-migration process

Bug #1933517 reported by Rodolfo Alonso
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Rodolfo Alonso
os-vif
Fix Released
Medium
sean mooney

Bug Description

When live migrating network sensitive VMs, the communication is broken.

This is similar to [1] but in OVN the vif-plugged events are directly controller by the Neutron server, not by the OVS/DHCP agents.

The problem lies in when the destination chassis creates the needed OF rules for the destination VM port. Same as in OVS, the VM port is created when the instance is unpaused. At this moment the VM continues sending packets through the interface but OVN didn't finish the configuration.

Related BZs:
- OSP16.1: https://bugzilla.redhat.com/show_bug.cgi?id=1903653
- OSP16.1: https://bugzilla.redhat.com/show_bug.cgi?id=1872937
- OSP16.1: https://bugzilla.redhat.com/show_bug.cgi?id=1966512

[1]https://bugs.launchpad.net/neutron/+bug/1901707

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
importance: Undecided → Low
importance: Low → Medium
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Potential solution: add a trunk bridge per port with a patch port between these and br-int. This is similar to hybrid plugin but with an OVS bridge instead of a Linux Bridge.

This solution:
- This allows us to have a port ID that can be used to configure the flows.
- Is 100% compatible with DPDK
- Does not hurt the performance. OVS will collapse the created bridge and the datapath will be the same.
- Can be enabled/disabled.
- There is an ongoing effort to add this functionality in os-vif, same as with hybrid plugin.

Cons:
- OVS QoS won't work directly on the patch port connected to the VM port bridge. It will need the reference to the VM port.

Akihiro Motoki (amotoki)
tags: added: ovn
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Thank you for your feedback in the Neutron meeting. I'll present a spec ASAP.

summary: - [OVN] Live migration of network sensitive VMs breaks communication
+ [RFE][OVN] Live migration of network sensitive VMs breaks communication
Revision history for this message
Akihiro Motoki (amotoki) wrote : Re: [RFE][OVN] Live migration of network sensitive VMs breaks communication

as discussed in the team meeting Jun 29, we will handle it as RFE.

tags: added: rfe-triaged
tags: added: rfe
removed: rfe-triaged
tags: added: rfe-triaged
removed: rfe
summary: - [RFE][OVN] Live migration of network sensitive VMs breaks communication
+ [RFE][OVN] Create an intermediate OVS bridge between VM and intergration
+ bridge to improve the live-migration process
Revision history for this message
Akihiro Motoki (amotoki) wrote :

The context of proposing an intermediate OVS bridge is found in the neutron team meeting log Jun 29.
https://meetings.opendev.org/meetings/networking/2021/networking.2021-06-29-14.00.log.html#l-155

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Let's discuss it on the next drivers meeting 02.07.2021

Changed in os-vif:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → sean mooney (sean-k-mooney)
Changed in neutron:
status: New → Triaged
Changed in os-vif:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-specs (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron-specs/+/799198

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

This rfe was discussed and approved on the drivers meeting today.

tags: added: rfe-approved
removed: rfe-triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-vif (master)

Reviewed: https://review.opendev.org/c/openstack/os-vif/+/798055
Committed: https://opendev.org/openstack/os-vif/commit/b837c1a74f37191692a820711e431a75516a4abf
Submitter: "Zuul (22348)"
Branch: master

commit b837c1a74f37191692a820711e431a75516a4abf
Author: Sean Mooney <email address hidden>
Date: Fri Jun 25 07:50:26 2021 +0000

    add configurable per port bridges

    This patch add a new configuration option to use
    per port bridge when hybrid_plug is false.
    This can be used with OVN to reduce packet loss
    during a live migration.

    OVN can only install openflow rules when a port both has
    external_ids set and an ofport-id assigned.
    Since the ofport-id is only assigned when a netdev matching
    the port name exists connected to the dataplane, OVN cannot
    install the flows until libvirt create the tap on the destination
    host during a live migration.

    On loaded systems this can result in multiple seconds of packet loss.
    To address this we introduce per port bridges which are connencted
    to the integration brige by a patch port pair. Since the patch port
    will exist on the dataplane during pre live migration OVN can install
    the flows on the integration bridge before we begin the migration reducing
    or avoiding packet loss.

    Change-Id: I0d55ccbef5b585330b5512e67e442b80304a2e73
    Depends-On: https://review.opendev.org/c/openstack/nova/+/797428
    Closes-Bug: #1933517

Changed in os-vif:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-vif (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/os-vif/+/802475

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-vif 2.6.0

This issue was fixed in the openstack/os-vif 2.6.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-specs (master)

Reviewed: https://review.opendev.org/c/openstack/neutron-specs/+/799198
Committed: https://opendev.org/openstack/neutron-specs/commit/81f4ecb9d4b2218755908ddc855456297932a6f6
Submitter: "Zuul (22348)"
Branch: master

commit 81f4ecb9d4b2218755908ddc855456297932a6f6
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Fri Jul 2 11:33:23 2021 +0000

    Create intermediate OVS bridge to improve live-migration in OVN

    This spec proposes to add an intermediate bridge between the VM patch
    port and the integration bridge. That will allow the backend (OVN)
    to properly configure the needed OpenFlow rules before the VM
    is unpaused in the destination host. That will reduce the
    networking disruption during the live migration process.

    Change-Id: I558523a8922567efb0739173c7c2fda72504a8fe
    Related-Bug: #1933517

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-vif (stable/wallaby)

Change abandoned by "sean mooney <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/os-vif/+/802475

Revision history for this message
Brian Haley (brian-haley) wrote :

Looks like this was fixed with the os-vif change, will close. Please re-open if necessary.

Changed in neutron:
status: Triaged → Fix Released
Revision history for this message
norman shen (jshen28) wrote (last edit ):

Sorry for adding new comments to this closed bug report, but I am curious what is the live migration benchmark for packet loss right now for hybrid, openvswitch and ovn individually.

I saw the comments from neutron meeting and quote "
14:46:36 <ralonsoh> we have a serious problem with OVN live migrations
14:46:45 <ralonsoh> in OVS with hybrid plug we are ok
14:46:59 <ralonsoh> because os-vif creates a bridge between the VM and OVS
14:47:10 <ralonsoh> so Neutron is aware of this new port and creates the needed rules
14:47:34 <ralonsoh> in OVN and OVS native, this is not happening
14:47:46 <ralonsoh> because libvirt creates the port when the VM is unpaused

"

I am not sure I fully understand the issue but it looks to me there are two different claims.

1. for ovn case, libvirt will create port after vm resumed on the destination
2. for iptables hybrid it is not case, and neutron-ovs-agent has setup the correct flows for this case.

I am not currently using ovn so the first claim does not bother me.

I do want to double check the second claim because for me (I am using victoria openstack), neutron-ovs-agent will not setup flow tables until nova activates port binding on the destination. And I saw quite a few packet losses for iptable hybrid case (announce self only mitages the issue and it is far away from ideal)

My question is thus could anybody generously share live migration packet loss benchmark? Thank you.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.