Port update exception on nova unshelve for instance with PCI devices

Bug #1677621 reported by Steven Webster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Steven Webster
Newton
Fix Committed
Medium
Matt Riedemann
Ocata
Fix Committed
Medium
Lee Yarwood

Bug Description

Description
===========
If an instance with PCI devices (SRIOV, or passthrough) is shelved, a port update exception will be seen and the instance will go into Error state when it is unshelved.

The nova API exception message is similar to:

"Unable to correlate PCI slot 0000:0d:00.1"

Steps to reproduce
==================
1. Launch an instance with SRIOV or PCI passthrough port bindings.

2. nova shelve <instance_uuid>

-- wait for nova instance status SHELVED_OFFLOADED --

3. nova unshelve <instance_uuid>

Expected result
===============
If there are resources available, the instance should be able to claim PCI devices and successfully (re)launch.

Actual result
=============
- Instance in error state
- Exception in nova api logs.

Environment
===========
1. Exact version of OpenStack you are running: Ocata, devstack

2. Which hypervisor did you use? Libvirt + KVM

2. Which storage type did you use? LVM

3. Which networking type did you use? Neutron, OVS

Changed in nova:
assignee: nobody → Steven Webster (swebster-wr)
Changed in nova:
status: New → In Progress
tags: added: pci
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/453938

Jay Pipes (jaypipes)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/453938
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c2ff276c841934ff147aab836a4bd099297fb46b
Submitter: Jenkins
Branch: master

commit c2ff276c841934ff147aab836a4bd099297fb46b
Author: Steven Webster <email address hidden>
Date: Mon Mar 27 12:18:23 2017 -0400

    Fix port update exception when unshelving an instance with PCI devices

    It is possible that _update_port_binding_for_instance() is called
    without a migration object, such as when a user unshelves an instance.

    If the instance has a port(s) with a PCI device binding, the current
    logic extracts a pci mapping from old to new devices from the migration
    object and migration context. If a 'new' device is not found in the
    PCI mapping, an exception is thrown.

    In the case of an unshelve, there is no migration object (or migration
    context), and as such we have an empty pci mapping.

    This fix will only check for a new device if we have a migration object.

    Closes-Bug: 1677621
    Change-Id: I578153ca862753ef5b8041ee3853d3c7b2e2be30

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/459840

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/460173

Matt Riedemann (mriedem)
tags: added: neutron unshelve
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/460233

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/459840
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f281c18ba9aea1b2e8a36d5ae91a7acc5324ac5e
Submitter: Jenkins
Branch: stable/ocata

commit f281c18ba9aea1b2e8a36d5ae91a7acc5324ac5e
Author: Steven Webster <email address hidden>
Date: Mon Mar 27 12:18:23 2017 -0400

    Fix port update exception when unshelving an instance with PCI devices

    It is possible that _update_port_binding_for_instance() is called
    without a migration object, such as when a user unshelves an instance.

    If the instance has a port(s) with a PCI device binding, the current
    logic extracts a pci mapping from old to new devices from the migration
    object and migration context. If a 'new' device is not found in the
    PCI mapping, an exception is thrown.

    In the case of an unshelve, there is no migration object (or migration
    context), and as such we have an empty pci mapping.

    This fix will only check for a new device if we have a migration object.

    Closes-Bug: 1677621
    Change-Id: I578153ca862753ef5b8041ee3853d3c7b2e2be30
    (cherry picked from commit c2ff276c841934ff147aab836a4bd099297fb46b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/460173
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=70c1eb689ad174b61ad915ae5384778bd536c16c
Submitter: Jenkins
Branch: stable/newton

commit 70c1eb689ad174b61ad915ae5384778bd536c16c
Author: Steven Webster <email address hidden>
Date: Mon Mar 27 12:18:23 2017 -0400

    Fix port update exception when unshelving an instance with PCI devices

    It is possible that _update_port_binding_for_instance() is called
    without a migration object, such as when a user unshelves an instance.

    If the instance has a port(s) with a PCI device binding, the current
    logic extracts a pci mapping from old to new devices from the migration
    object and migration context. If a 'new' device is not found in the
    PCI mapping, an exception is thrown.

    In the case of an unshelve, there is no migration object (or migration
    context), and as such we have an empty pci mapping.

    This fix will only check for a new device if we have a migration object.

    Conflicts:
          nova/tests/unit/network/test_neutronv2.py

    NOTE(mriedem): The conflict is due to not having change
    I818d2232f3398489be6303414585840c151e4db7 in Newton.

    Closes-Bug: 1677621
    Change-Id: I578153ca862753ef5b8041ee3853d3c7b2e2be30
    (cherry picked from commit c2ff276c841934ff147aab836a4bd099297fb46b)
    (cherry picked from commit f281c18ba9aea1b2e8a36d5ae91a7acc5324ac5e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/460233
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=44a55d65e2aecdaa8fbcd32b2cde1b81a6fe74d8
Submitter: Jenkins
Branch: master

commit 44a55d65e2aecdaa8fbcd32b2cde1b81a6fe74d8
Author: Steven Webster <email address hidden>
Date: Wed Apr 26 12:10:23 2017 -0400

    Improve comment for PCI port binding update

    This commit expands the comment block for PCI logic in
    _update_port_binding_for_instance() to explain the cases
    where a migration may or may not be present.

    Change-Id: I1e699367576fbabe78fae0949588b3f40fe08da4
    Related-Bug: #1677621

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.4

This issue was fixed in the openstack/nova 15.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.6

This issue was fixed in the openstack/nova 14.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b2

This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/484381

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/484381
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b930336854bffec1bb81b6d67079a4df59e0af19
Submitter: Zuul
Branch: master

commit b930336854bffec1bb81b6d67079a4df59e0af19
Author: Steven Webster <email address hidden>
Date: Mon Jun 12 17:10:03 2017 -0400

    Fix instance evacuation with PCI devices

    update_port_binding_for_instance() now checks that a valid migration
    object exists as a parameter before any mapping between old/new PCI
    devices can occur. A migration should be present in the case of a
    cold migration, resize, or evacuation.

    An evacuation (being a special case of a rebuild) however, will not
    pass a migration to update_port_binding_for_instance, as it
    is called directly from setup_instance_network(). This calling function
    does not currently take a migration parameter, even though one will
    certainly exist for an evacuation.

    This commit adds an optional migration parameter to
    setup_instance_network_on_host() and passes any migration object to
    the port update routine.

    Closes-Bug: #1703629
    Related-Bug: #1677621
    Related-Bug: #1630698

    Change-Id: I4e394c8d275995eac4b049a7b1329ea90f2394be

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/590059

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/590062

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/590070

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/382853
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be
Submitter: Zuul
Branch: master

commit 8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be
Author: paul-carlton2 <email address hidden>
Date: Thu Oct 6 12:02:15 2016 +0100

    Update nova network info when doing rebuild for evacuate operation

    When nova evacuate or host-evacuate are used to recreate instances with
    sriov ports the instances are allocated new device ids on the target and
    neutron is updated accordingly. However the network info data passed
    to the driver spawn method is not updated and thus the instance tries
    to use the device id they were allocated on the source node. If a pre
    existing instance is using that device id or no such device exists on
    the target node then the instance will fail to start.

    Co-Authored-By: Steven Webster <email address hidden>
    Change-Id: I860ab9cf3f9a38bd4ea5bceecda8105b6fee93dc
    Closes-Bug: #1630698
    Related-Bug: #1677621

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/590062
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=559de0d0a3df92ef9dca21edfaa754a2848013c3
Submitter: Zuul
Branch: stable/queens

commit 559de0d0a3df92ef9dca21edfaa754a2848013c3
Author: paul-carlton2 <email address hidden>
Date: Thu Oct 6 12:02:15 2016 +0100

    Update nova network info when doing rebuild for evacuate operation

    When nova evacuate or host-evacuate are used to recreate instances with
    sriov ports the instances are allocated new device ids on the target and
    neutron is updated accordingly. However the network info data passed
    to the driver spawn method is not updated and thus the instance tries
    to use the device id they were allocated on the source node. If a pre
    existing instance is using that device id or no such device exists on
    the target node then the instance will fail to start.

    NOTE(mriedem): The test had to be modified in the backport because
    notify_about_instance_rebuild and _check_trusted_certs didn't exist
    in Queens.

    Co-Authored-By: Steven Webster <email address hidden>
    Change-Id: I860ab9cf3f9a38bd4ea5bceecda8105b6fee93dc
    Closes-Bug: #1630698
    Related-Bug: #1677621
    (cherry picked from commit 8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/605881

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/605882

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/590059
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b9c1a58dd033fa8feeb1175956d57dc90aa55acd
Submitter: Zuul
Branch: stable/pike

commit b9c1a58dd033fa8feeb1175956d57dc90aa55acd
Author: Steven Webster <email address hidden>
Date: Mon Jun 12 17:10:03 2017 -0400

    Fix instance evacuation with PCI devices

    update_port_binding_for_instance() now checks that a valid migration
    object exists as a parameter before any mapping between old/new PCI
    devices can occur. A migration should be present in the case of a
    cold migration, resize, or evacuation.

    An evacuation (being a special case of a rebuild) however, will not
    pass a migration to update_port_binding_for_instance, as it
    is called directly from setup_instance_network(). This calling function
    does not currently take a migration parameter, even though one will
    certainly exist for an evacuation.

    This commit adds an optional migration parameter to
    setup_instance_network_on_host() and passes any migration object to
    the port update routine.

    Closes-Bug: #1703629
    Related-Bug: #1677621
    Related-Bug: #1630698

    Change-Id: I4e394c8d275995eac4b049a7b1329ea90f2394be
    (cherry picked from commit b930336854bffec1bb81b6d67079a4df59e0af19)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/590070
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c5d8594cb62c1397183888133bef68fe28c62c7a
Submitter: Zuul
Branch: stable/pike

commit c5d8594cb62c1397183888133bef68fe28c62c7a
Author: paul-carlton2 <email address hidden>
Date: Thu Oct 6 12:02:15 2016 +0100

    Update nova network info when doing rebuild for evacuate operation

    When nova evacuate or host-evacuate are used to recreate instances with
    sriov ports the instances are allocated new device ids on the target and
    neutron is updated accordingly. However the network info data passed
    to the driver spawn method is not updated and thus the instance tries
    to use the device id they were allocated on the source node. If a pre
    existing instance is using that device id or no such device exists on
    the target node then the instance will fail to start.

    Conflicts:
          nova/compute/manager.py
          nova/tests/unit/compute/test_compute_mgr.py

    NOTE(mriedem): The conflicts are due to change
    I00eab47edf1150788777300680e853a872c1db40 and change
    I752617066bb2167b49239ab9d17b0c89754a3e12 not being in Pike.

    Co-Authored-By: Steven Webster <email address hidden>
    Change-Id: I860ab9cf3f9a38bd4ea5bceecda8105b6fee93dc
    Closes-Bug: #1630698
    Related-Bug: #1677621
    (cherry picked from commit 8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be)
    (cherry picked from commit 559de0d0a3df92ef9dca21edfaa754a2848013c3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/605881
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8b3b8e5657ffe0f12e1ccfbe6d978e62a2bfdc89
Submitter: Zuul
Branch: stable/ocata

commit 8b3b8e5657ffe0f12e1ccfbe6d978e62a2bfdc89
Author: Steven Webster <email address hidden>
Date: Mon Jun 12 17:10:03 2017 -0400

    Fix instance evacuation with PCI devices

    update_port_binding_for_instance() now checks that a valid migration
    object exists as a parameter before any mapping between old/new PCI
    devices can occur. A migration should be present in the case of a
    cold migration, resize, or evacuation.

    An evacuation (being a special case of a rebuild) however, will not
    pass a migration to update_port_binding_for_instance, as it
    is called directly from setup_instance_network(). This calling function
    does not currently take a migration parameter, even though one will
    certainly exist for an evacuation.

    This commit adds an optional migration parameter to
    setup_instance_network_on_host() and passes any migration object to
    the port update routine.

    Closes-Bug: #1703629
    Related-Bug: #1677621
    Related-Bug: #1630698

    Change-Id: I4e394c8d275995eac4b049a7b1329ea90f2394be
    (cherry picked from commit b930336854bffec1bb81b6d67079a4df59e0af19)
    (cherry picked from commit b9c1a58dd033fa8feeb1175956d57dc90aa55acd)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/605882
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9e146072b6ab5c614cc4502170cc686cc8cc7bce
Submitter: Zuul
Branch: stable/ocata

commit 9e146072b6ab5c614cc4502170cc686cc8cc7bce
Author: paul-carlton2 <email address hidden>
Date: Thu Oct 6 12:02:15 2016 +0100

    Update nova network info when doing rebuild for evacuate operation

    When nova evacuate or host-evacuate are used to recreate instances with
    sriov ports the instances are allocated new device ids on the target and
    neutron is updated accordingly. However the network info data passed
    to the driver spawn method is not updated and thus the instance tries
    to use the device id they were allocated on the source node. If a pre
    existing instance is using that device id or no such device exists on
    the target node then the instance will fail to start.

    Conflicts:
          nova/compute/manager.py
          nova/tests/unit/compute/test_compute_mgr.py

    NOTE(mriedem): The conflict is due to not having change
    Iddae8074554995df22b656bb2e9bddaec6d775cc in Ocata.

    Co-Authored-By: Steven Webster <email address hidden>
    Change-Id: I860ab9cf3f9a38bd4ea5bceecda8105b6fee93dc
    Closes-Bug: #1630698
    Related-Bug: #1677621
    (cherry picked from commit 8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be)
    (cherry picked from commit 559de0d0a3df92ef9dca21edfaa754a2848013c3)
    (cherry picked from commit c5d8594cb62c1397183888133bef68fe28c62c7a)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.