Wrong representor port was unplugged from OVS during cold migration

Bug #1809095 reported by Maria Luisa Arches
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Adrian Chiris
Queens
Fix Committed
Medium
Adrian Chiris
Rocky
Fix Committed
Medium
Adrian Chiris
Stein
Fix Committed
Medium
Adrian Chiris

Bug Description

Description
===========
Wrong representor port was unplugged from OVS during cold migration.
This happens when VM is scheduled to use a different PCI device to target host vs.
to what PCI device it is using from source host. Nova uses new PCI device information to unplug
representor port in source compute.

Steps to reproduce
==================
1. Create representor ports
$ openstack port create --network private --vnic-type=direct --binding-profile '{"capabilities": ["switchdev"]}' direct_port1
$ openstack port create --network private --vnic-type=direct --binding-profile '{"capabilities": ["switchdev"]}' direct_port2
2. Create VMs using the ports created above:
openstack server create --flavor m1.small --image fedora24 --nic port-id=direct_port1 --availability-zone=nova:compute-1 vm1
openstack server create --flavor m1.small --image fedora24 --nic port-id=direct_port2 --availability-zone=nova:compute-2 vm2
3. Migrate VM2
$ openstack server migrate vm2
$ openstack server resize --confirm vm2
4. VM2 was migrated to compute-1, however representor port is still attached to OVS
$ sudo ovs-dpctl show
system@ovs-system:
        lookups: hit:466465 missed:5411 lost:0
        flows: 12
        masks: hit:739146 total:2 hit/pkt:1.57
        port 0: ovs-system (internal)
        port 1: br-pro0.0 (internal)
        port 2: br-pro0 (internal)
        port 3: ens6f0
        port 4: br-int (internal)
        port 5: eth3

Expected result
===============
After cold migration, VM's previously used representor port should be unplugged from OVS

Actual result
=============
VM's previously used representor port is still plugged in source host. In some scenarios, wrong representor port was unplugged from source host. Thus affecting VMs that were not cold migrated.

Environment
===========
Libvirt+KVM
$ /usr/libexec/qemu-kvm --version
QEMU emulator version 2.10.0
$ virsh --version
3.9.0
Neutron+OVS HW Offload
Openstack Queens openstack-nova-compute-17.0.7-1

Logs & Configs
==============
1. Plug vif device using pci address 0000:81:00.5
2018-12-15 13:12:04.871 108055 DEBUG os_vif [req-cd20d9ab-e880-41fa-aee5-97b920abcf77 dd9f16f6b15740e181c9b7cf8ee5795c 52298dbce7024cf89ca9e6d7369a67de - default default] Plugging vif VIFHostDevice(active=False,address=fa:16:3e:1b:0a:21,dev_address=0000:81:00.5,dev_type='ethernet',has_traffic_filtering=True,id=38609ab2-cf36-4782-83c7-7ee2d5c1c163,network=Network(bd30c752-4876-498b-9a36-e9733b635f4f),plugin='ovs',port_profile=VIFPortProfileOVSRepresentor,preserve_on_delete=True) plug /usr/lib/python2.7/site-packages/os_vif/__init__.py:76

2. VM was migrated from compute-1 to compute-2. New pci device is now 0000:81:00.4
2018-12-15 13:13:58.721 108055 DEBUG os_vif [req-afd99706-cf49-4c20-b85b-ea4d990ffbb4 dd9f16f6b15740e181c9b7cf8ee5795c 52298dbce7024cf89ca9e6d7369a67de - default default] Unplugging vif VIFHostDevice(active=True,address=fa:16:3e:1b:0a:21,dev_address=0000:81:00.4,dev_type='ethernet',has_traffic_filtering=True,id=38609ab2-cf36-4782-83c7-7ee2d5c1c163,network=Network(bd30c752-4876-498b-9a36-e9733b635f4f),plugin='ovs',port_profile=VIFPortProfileOVSRepresentor,preserve_on_delete=True) unplug /usr/lib/python2.7/site-packages/os_vif/__init__.py:109
2018-12-15 13:13:58.759 108055 INFO os_vif [req-afd99706-cf49-4c20-b85b-ea4d990ffbb4 dd9f16f6b15740e181c9b7cf8ee5795c 52298dbce7024cf89ca9e6d7369a67de - default default] Successfully unplugged vif VIFHostDevice(active=True,address=fa:16:3e:1b:0a:21,dev_address=0000:81:00.4,dev_type='ethernet',has_traffic_filtering=True,id=38609ab2-cf36-4782-83c7-7ee2d5c1c163,network=Network(bd30c752-4876-498b-9a36-e9733b635f4f),plugin='ovs',port_profile=VIFPortProfileOVSRepresentor,preserve_on_delete=True)

vif_plug_ovs used the new information passed by Nova to unplug representor port:
https://github.com/openstack/os-vif/blob/db5216357b1be93d91aa48b2878599f2dfef02a8/vif_plug_ovs/ovs.py#L299

Matt Riedemann (mriedem)
tags: added: pci resize
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/630978

Changed in nova:
assignee: nobody → Maria Luisa Arches (arches)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/643023

Changed in nova:
assignee: Maria Luisa Arches (arches) → Adrian Chiris (adrian.chiris)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/643024

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Maria Luisa Arches (<email address hidden>) on branch: master
Review: https://review.openstack.org/630978
Reason: Abandoned, Adrian has another proposed fix for this

Revision history for this message
sean mooney (sean-k-mooney) wrote :

triaged as medium.
i think this is a valid bug and i think its possible for the cold migration of
one instance it effect the connectivy of another on it previous host.

because of the potential for cross tenant interaction i would normaly set this
as high however i belive this will only effect deployment with hardware offloaded ovs
which a small percentage of deployment even when compured to sriov deployments.

as such i have lowered the importance to medium as a patch is already in flight
and we have not seen many endusers report or hit this bug in the 4 months it has been open.

Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/643023
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5a1c385b996090b80f5881680e04c88abc21828a
Submitter: Zuul
Branch: master

commit 5a1c385b996090b80f5881680e04c88abc21828a
Author: Adrian Chiris <email address hidden>
Date: Tue Mar 12 14:19:04 2019 +0200

    Move get_pci_mapping_for_migration to MigrationContext

    In order to fix Bug #1809095, it is required to update
    PCI related VIFs with the original PCI address on the source
    host to allow virt driver to properly unplug the VIF from hypervisor,
    e.g allow the proper VF representor to be unplugged
    from the integration bridge in case of a hardware offloaded OVS.

    To do so, some preliminary work is needed to allow code-sharing
    between nova.network.neutronv2 and nova.compute.manager

    This change:
    - Moves common logic to retrieve the PCI mapping between
      the source and destination node from nova.network.neutronv2
      to objects.migration_context.
    - Makes code adjustments to methods in nova.network.neutronv2
      to accomodate the former.

    Change-Id: I9a5118373548c525b2b1c2271e7d210cc92e4f4c
    Partial-Bug: #1809095

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/643024
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=000e93df09fd4941bb69f715c955b940871a1ec6
Submitter: Zuul
Branch: master

commit 000e93df09fd4941bb69f715c955b940871a1ec6
Author: Adrian Chiris <email address hidden>
Date: Tue Mar 12 14:34:35 2019 +0200

    Allow driver to properly unplug VIFs on destination on confirm resize

    Update PCI related VIFs with the original PCI address on the source
    host to allow the virt driver to properly unplug VIFs from
    hypervisor, e.g allow the proper VF representor to be unplugged
    from the integration bridge in case of a hardware offloaded OVS.

    While other approaches are possible for solving the issue,
    The approach proposed in the series allows the fix to be safely
    backported.

    Change-Id: Id3c4d839fb1a6da47cfb366b65c0904d281a218f
    Closes-Bug: #1809095

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/661494

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/661495

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/661499

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/661500

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/661571

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/661572

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/661494
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=84bb00a86da539183211364961ada2c1b1bb5edc
Submitter: Zuul
Branch: stable/stein

commit 84bb00a86da539183211364961ada2c1b1bb5edc
Author: Adrian Chiris <email address hidden>
Date: Tue Mar 12 14:19:04 2019 +0200

    Move get_pci_mapping_for_migration to MigrationContext

    In order to fix Bug #1809095, it is required to update
    PCI related VIFs with the original PCI address on the source
    host to allow virt driver to properly unplug the VIF from hypervisor,
    e.g allow the proper VF representor to be unplugged
    from the integration bridge in case of a hardware offloaded OVS.

    To do so, some preliminary work is needed to allow code-sharing
    between nova.network.neutronv2 and nova.compute.manager

    This change:
    - Moves common logic to retrieve the PCI mapping between
      the source and destination node from nova.network.neutronv2
      to objects.migration_context.
    - Makes code adjustments to methods in nova.network.neutronv2
      to accomodate the former.

    Change-Id: I9a5118373548c525b2b1c2271e7d210cc92e4f4c
    Partial-Bug: #1809095
    (cherry picked from commit 5a1c385b996090b80f5881680e04c88abc21828a)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/661495
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=77a339c4e8d51de82cd6c4530b8473cc8bb6aea8
Submitter: Zuul
Branch: stable/stein

commit 77a339c4e8d51de82cd6c4530b8473cc8bb6aea8
Author: Adrian Chiris <email address hidden>
Date: Tue Mar 12 14:34:35 2019 +0200

    Allow driver to properly unplug VIFs on destination on confirm resize

    Update PCI related VIFs with the original PCI address on the source
    host to allow the virt driver to properly unplug VIFs from
    hypervisor, e.g allow the proper VF representor to be unplugged
    from the integration bridge in case of a hardware offloaded OVS.

    While other approaches are possible for solving the issue,
    The approach proposed in the series allows the fix to be safely
    backported.

    Closes-Bug: #1809095

    Conflicts:
        nova/tests/unit/compute/test_compute_mgr.py

    Change-Id: Id3c4d839fb1a6da47cfb366b65c0904d281a218f
    (cherry picked from commit 000e93df09fd4941bb69f715c955b940871a1ec6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.1

This issue was fixed in the openstack/nova 19.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/661499
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=28e7be8c8be5609ee33ff3eedacd20a70b9a409d
Submitter: Zuul
Branch: stable/rocky

commit 28e7be8c8be5609ee33ff3eedacd20a70b9a409d
Author: Adrian Chiris <email address hidden>
Date: Tue Mar 12 14:19:04 2019 +0200

    Move get_pci_mapping_for_migration to MigrationContext

    In order to fix Bug #1809095, it is required to update
    PCI related VIFs with the original PCI address on the source
    host to allow virt driver to properly unplug the VIF from hypervisor,
    e.g allow the proper VF representor to be unplugged
    from the integration bridge in case of a hardware offloaded OVS.

    To do so, some preliminary work is needed to allow code-sharing
    between nova.network.neutronv2 and nova.compute.manager

    This change:
    - Moves common logic to retrieve the PCI mapping between
      the source and destination node from nova.network.neutronv2
      to objects.migration_context.
    - Makes code adjustments to methods in nova.network.neutronv2
      to accomodate the former.

    Partial-Bug: #1809095

    Conflicts:
        nova/network/neutronv2/api.py

    Change-Id: I9a5118373548c525b2b1c2271e7d210cc92e4f4c
    (cherry picked from commit 84bb00a86da539183211364961ada2c1b1bb5edc)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/661500
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=841d7d329df6832dcadcd9a1d59b1cf7c758209b
Submitter: Zuul
Branch: stable/rocky

commit 841d7d329df6832dcadcd9a1d59b1cf7c758209b
Author: Adrian Chiris <email address hidden>
Date: Tue Mar 12 14:34:35 2019 +0200

    Allow driver to properly unplug VIFs on destination on confirm resize

    Update PCI related VIFs with the original PCI address on the source
    host to allow the virt driver to properly unplug VIFs from
    hypervisor, e.g allow the proper VF representor to be unplugged
    from the integration bridge in case of a hardware offloaded OVS.

    While other approaches are possible for solving the issue,
    The approach proposed in the series allows the fix to be safely
    backported.

    Closes-Bug: #1809095

    Conflicts in unit tests were trivial to solve
    no changes in test logic.

    Conflicts:
        nova/tests/unit/compute/test_compute.py
        nova/tests/unit/compute/test_compute_mgr.py

    Change-Id: Id3c4d839fb1a6da47cfb366b65c0904d281a218f
    (cherry picked from commit 77a339c4e8d51de82cd6c4530b8473cc8bb6aea8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.1

This issue was fixed in the openstack/nova 18.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/661571
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8519b1ce6d0d001a57f57cf0a9ffcfdd2b6bc813
Submitter: Zuul
Branch: stable/queens

commit 8519b1ce6d0d001a57f57cf0a9ffcfdd2b6bc813
Author: Adrian Chiris <email address hidden>
Date: Tue Mar 12 14:19:04 2019 +0200

    Move get_pci_mapping_for_migration to MigrationContext

    In order to fix Bug #1809095, it is required to update
    PCI related VIFs with the original PCI address on the source
    host to allow virt driver to properly unplug the VIF from hypervisor,
    e.g allow the proper VF representor to be unplugged
    from the integration bridge in case of a hardware offloaded OVS.

    To do so, some preliminary work is needed to allow code-sharing
    between nova.network.neutronv2 and nova.compute.manager

    This change:
    - Moves common logic to retrieve the PCI mapping between
      the source and destination node from nova.network.neutronv2
      to objects.migration_context.
    - Makes code adjustments to methods in nova.network.neutronv2
      to accomodate the former.

    Partial-Bug: #1809095

    Change-Id: I9a5118373548c525b2b1c2271e7d210cc92e4f4c
    (cherry picked from commit 84bb00a86da539183211364961ada2c1b1bb5edc)
    (cherry picked from commit 28e7be8c8be5609ee33ff3eedacd20a70b9a409d)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/661572
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ab445a027a6262eb5d54f82edf76e8d925ac427e
Submitter: Zuul
Branch: stable/queens

commit ab445a027a6262eb5d54f82edf76e8d925ac427e
Author: Adrian Chiris <email address hidden>
Date: Tue Mar 12 14:34:35 2019 +0200

    Allow driver to properly unplug VIFs on destination on confirm resize

    Update PCI related VIFs with the original PCI address on the source
    host to allow the virt driver to properly unplug VIFs from
    hypervisor, e.g allow the proper VF representor to be unplugged
    from the integration bridge in case of a hardware offloaded OVS.

    While other approaches are possible for solving the issue,
    The approach proposed in the series allows the fix to be safely
    backported.

    Closes-Bug: #1809095

    Change-Id: Id3c4d839fb1a6da47cfb366b65c0904d281a218f
    (cherry picked from commit 77a339c4e8d51de82cd6c4530b8473cc8bb6aea8)
    (cherry picked from commit 841d7d329df6832dcadcd9a1d59b1cf7c758209b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.11

This issue was fixed in the openstack/nova 17.0.11 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.