nova evacuate of instances with sriov ports fails due to use of source device

Bug #1630698 reported by Paul Carlton
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Paul Carlton
Pike
Fix Committed
Medium
Matt Riedemann
Queens
Fix Committed
Medium
Matt Riedemann

Bug Description

When nova evacuate or host-evacuate are used to recreate instances with sriov ports the instances are allocated new device ids on the target and neutron is updated accordingly. However the network info data passed to the driver spawn method is not updated and thus the instance tries to use the device id they were allocated on the source node. If a pre existing instance is using that device id or no such device exists on the target node then the instance will fail to start.

Changed in nova:
assignee: nobody → Paul Carlton (paul-carlton2)
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
tags: added: pci rebuild
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/382853

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: Paul Carlton (paul-carlton2) → Steven Webster (swebster-wr)
Changed in nova:
assignee: Steven Webster (swebster-wr) → Maciej Kucia (maciejkucia)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/484381

Changed in nova:
assignee: Maciej Kucia (maciejkucia) → nobody
Changed in nova:
assignee: nobody → Steven Webster (swebster-wr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/484381
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b930336854bffec1bb81b6d67079a4df59e0af19
Submitter: Zuul
Branch: master

commit b930336854bffec1bb81b6d67079a4df59e0af19
Author: Steven Webster <email address hidden>
Date: Mon Jun 12 17:10:03 2017 -0400

    Fix instance evacuation with PCI devices

    update_port_binding_for_instance() now checks that a valid migration
    object exists as a parameter before any mapping between old/new PCI
    devices can occur. A migration should be present in the case of a
    cold migration, resize, or evacuation.

    An evacuation (being a special case of a rebuild) however, will not
    pass a migration to update_port_binding_for_instance, as it
    is called directly from setup_instance_network(). This calling function
    does not currently take a migration parameter, even though one will
    certainly exist for an evacuation.

    This commit adds an optional migration parameter to
    setup_instance_network_on_host() and passes any migration object to
    the port update routine.

    Closes-Bug: #1703629
    Related-Bug: #1677621
    Related-Bug: #1630698

    Change-Id: I4e394c8d275995eac4b049a7b1329ea90f2394be

Changed in nova:
assignee: Steven Webster (swebster-wr) → Andrey Volkov (avolkov)
Changed in nova:
assignee: Andrey Volkov (avolkov) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Paul Carlton (paul-carlton2)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/590059

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/590062

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/590070

Matt Riedemann (mriedem)
Changed in nova:
importance: Low → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/382853
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be
Submitter: Zuul
Branch: master

commit 8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be
Author: paul-carlton2 <email address hidden>
Date: Thu Oct 6 12:02:15 2016 +0100

    Update nova network info when doing rebuild for evacuate operation

    When nova evacuate or host-evacuate are used to recreate instances with
    sriov ports the instances are allocated new device ids on the target and
    neutron is updated accordingly. However the network info data passed
    to the driver spawn method is not updated and thus the instance tries
    to use the device id they were allocated on the source node. If a pre
    existing instance is using that device id or no such device exists on
    the target node then the instance will fail to start.

    Co-Authored-By: Steven Webster <email address hidden>
    Change-Id: I860ab9cf3f9a38bd4ea5bceecda8105b6fee93dc
    Closes-Bug: #1630698
    Related-Bug: #1677621

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/590062
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=559de0d0a3df92ef9dca21edfaa754a2848013c3
Submitter: Zuul
Branch: stable/queens

commit 559de0d0a3df92ef9dca21edfaa754a2848013c3
Author: paul-carlton2 <email address hidden>
Date: Thu Oct 6 12:02:15 2016 +0100

    Update nova network info when doing rebuild for evacuate operation

    When nova evacuate or host-evacuate are used to recreate instances with
    sriov ports the instances are allocated new device ids on the target and
    neutron is updated accordingly. However the network info data passed
    to the driver spawn method is not updated and thus the instance tries
    to use the device id they were allocated on the source node. If a pre
    existing instance is using that device id or no such device exists on
    the target node then the instance will fail to start.

    NOTE(mriedem): The test had to be modified in the backport because
    notify_about_instance_rebuild and _check_trusted_certs didn't exist
    in Queens.

    Co-Authored-By: Steven Webster <email address hidden>
    Change-Id: I860ab9cf3f9a38bd4ea5bceecda8105b6fee93dc
    Closes-Bug: #1630698
    Related-Bug: #1677621
    (cherry picked from commit 8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0rc1

This issue was fixed in the openstack/nova 18.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.6

This issue was fixed in the openstack/nova 17.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/605881

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/605882

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/590059
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b9c1a58dd033fa8feeb1175956d57dc90aa55acd
Submitter: Zuul
Branch: stable/pike

commit b9c1a58dd033fa8feeb1175956d57dc90aa55acd
Author: Steven Webster <email address hidden>
Date: Mon Jun 12 17:10:03 2017 -0400

    Fix instance evacuation with PCI devices

    update_port_binding_for_instance() now checks that a valid migration
    object exists as a parameter before any mapping between old/new PCI
    devices can occur. A migration should be present in the case of a
    cold migration, resize, or evacuation.

    An evacuation (being a special case of a rebuild) however, will not
    pass a migration to update_port_binding_for_instance, as it
    is called directly from setup_instance_network(). This calling function
    does not currently take a migration parameter, even though one will
    certainly exist for an evacuation.

    This commit adds an optional migration parameter to
    setup_instance_network_on_host() and passes any migration object to
    the port update routine.

    Closes-Bug: #1703629
    Related-Bug: #1677621
    Related-Bug: #1630698

    Change-Id: I4e394c8d275995eac4b049a7b1329ea90f2394be
    (cherry picked from commit b930336854bffec1bb81b6d67079a4df59e0af19)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/590070
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c5d8594cb62c1397183888133bef68fe28c62c7a
Submitter: Zuul
Branch: stable/pike

commit c5d8594cb62c1397183888133bef68fe28c62c7a
Author: paul-carlton2 <email address hidden>
Date: Thu Oct 6 12:02:15 2016 +0100

    Update nova network info when doing rebuild for evacuate operation

    When nova evacuate or host-evacuate are used to recreate instances with
    sriov ports the instances are allocated new device ids on the target and
    neutron is updated accordingly. However the network info data passed
    to the driver spawn method is not updated and thus the instance tries
    to use the device id they were allocated on the source node. If a pre
    existing instance is using that device id or no such device exists on
    the target node then the instance will fail to start.

    Conflicts:
          nova/compute/manager.py
          nova/tests/unit/compute/test_compute_mgr.py

    NOTE(mriedem): The conflicts are due to change
    I00eab47edf1150788777300680e853a872c1db40 and change
    I752617066bb2167b49239ab9d17b0c89754a3e12 not being in Pike.

    Co-Authored-By: Steven Webster <email address hidden>
    Change-Id: I860ab9cf3f9a38bd4ea5bceecda8105b6fee93dc
    Closes-Bug: #1630698
    Related-Bug: #1677621
    (cherry picked from commit 8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be)
    (cherry picked from commit 559de0d0a3df92ef9dca21edfaa754a2848013c3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/605881
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8b3b8e5657ffe0f12e1ccfbe6d978e62a2bfdc89
Submitter: Zuul
Branch: stable/ocata

commit 8b3b8e5657ffe0f12e1ccfbe6d978e62a2bfdc89
Author: Steven Webster <email address hidden>
Date: Mon Jun 12 17:10:03 2017 -0400

    Fix instance evacuation with PCI devices

    update_port_binding_for_instance() now checks that a valid migration
    object exists as a parameter before any mapping between old/new PCI
    devices can occur. A migration should be present in the case of a
    cold migration, resize, or evacuation.

    An evacuation (being a special case of a rebuild) however, will not
    pass a migration to update_port_binding_for_instance, as it
    is called directly from setup_instance_network(). This calling function
    does not currently take a migration parameter, even though one will
    certainly exist for an evacuation.

    This commit adds an optional migration parameter to
    setup_instance_network_on_host() and passes any migration object to
    the port update routine.

    Closes-Bug: #1703629
    Related-Bug: #1677621
    Related-Bug: #1630698

    Change-Id: I4e394c8d275995eac4b049a7b1329ea90f2394be
    (cherry picked from commit b930336854bffec1bb81b6d67079a4df59e0af19)
    (cherry picked from commit b9c1a58dd033fa8feeb1175956d57dc90aa55acd)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/605882
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9e146072b6ab5c614cc4502170cc686cc8cc7bce
Submitter: Zuul
Branch: stable/ocata

commit 9e146072b6ab5c614cc4502170cc686cc8cc7bce
Author: paul-carlton2 <email address hidden>
Date: Thu Oct 6 12:02:15 2016 +0100

    Update nova network info when doing rebuild for evacuate operation

    When nova evacuate or host-evacuate are used to recreate instances with
    sriov ports the instances are allocated new device ids on the target and
    neutron is updated accordingly. However the network info data passed
    to the driver spawn method is not updated and thus the instance tries
    to use the device id they were allocated on the source node. If a pre
    existing instance is using that device id or no such device exists on
    the target node then the instance will fail to start.

    Conflicts:
          nova/compute/manager.py
          nova/tests/unit/compute/test_compute_mgr.py

    NOTE(mriedem): The conflict is due to not having change
    Iddae8074554995df22b656bb2e9bddaec6d775cc in Ocata.

    Co-Authored-By: Steven Webster <email address hidden>
    Change-Id: I860ab9cf3f9a38bd4ea5bceecda8105b6fee93dc
    Closes-Bug: #1630698
    Related-Bug: #1677621
    (cherry picked from commit 8e052c7fe9262c38da9d8b5f9a0ee889d9c1c6be)
    (cherry picked from commit 559de0d0a3df92ef9dca21edfaa754a2848013c3)
    (cherry picked from commit c5d8594cb62c1397183888133bef68fe28c62c7a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.6

This issue was fixed in the openstack/nova 16.1.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.1.5

This issue was fixed in the openstack/nova 15.1.5 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.