evacuate rebuild claim will not use any image_meta so it can miss numa_topology claims

Bug #1785318 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Ocata
Triaged
Medium
Unassigned
Pike
Triaged
Medium
Unassigned
Queens
Triaged
Medium
Unassigned
Rocky
Fix Committed
Medium
Matt Riedemann

Bug Description

I found this in the starlingx diff for nova:

https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-afb9c0c0ca5276c7eacd987bbf51d8e6R447

For volume-backed instances, the instance image_meta comes from the volume_image_metadata in the volume for the root bdm, the API figures that out here:

https://github.com/openstack/nova/blob/4c37ff72e5446c835a48d569dd5a1416fcd36c71/nova/compute/api.py#L1099

https://github.com/openstack/nova/blob/4c37ff72e5446c835a48d569dd5a1416fcd36c71/nova/compute/api.py#L1568

Then during an evacuate of a volume-backed instance, the rebuild_claim in the ResourceTracker won't actually get the proper image_meta because of this code in ComputeManager.rebuild_instance:

https://github.com/openstack/nova/blob/4c37ff72e5446c835a48d569dd5a1416fcd36c71/nova/compute/manager.py#L2985

The only thing in the claims code that cares about image_meta is for calculating numa_topology claims:

https://github.com/openstack/nova/blob/4c37ff72e5446c835a48d569dd5a1416fcd36c71/nova/compute/claims.py#L295

I'm not even totally sure if evacuate fully works with an instance using numa topology, but this can't help.

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is more than just volume-backed instances, it's evacuate in general. The API doesn't pass down the image_ref so we just pass {} to the rebuild_claim during evacuate in all cases, volume-backed or not.

summary: - evacuate rebuild claim will not use any image_meta for volume-backed
- instances
+ evacuate rebuild claim will not use any image_meta so it can miss
+ numa_topology claims
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/588657

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Matt Riedemann (mriedem)
tags: added: starlingx
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/588657
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=665ba461f3135857034cf33dd5f427d47fdd155e
Submitter: Zuul
Branch: master

commit 665ba461f3135857034cf33dd5f427d47fdd155e
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 3 16:54:49 2018 -0400

    Fix image-defined numa claims during evacuate

    When evacuating, the API does not send the image_ref
    to the compute so currently the compute manager
    rebuild_instance() method will just pass an empty
    dict for image_meta to the rebuild_claim, which means
    if the server was originally created with an image that
    has numa-related constraints, like hw_numa_nodes, those
    constraints would not be applied to the destination host
    during the evacuate.

    This change simply checks for evacuate if image_ref is
    not provided and pulls the image_meta off the instance
    which was stashed in the instance.system_metadata during
    server create (see get_system_metadata_from_image usage
    in the compute API).

    This fix was ported from the starlingx-staging/stx-nova
    repo commit 71acfeae0.

    Change-Id: If548fa3436174b1eae08cdcf6578020cc0c7b81f
    Closes-Bug: #1785318

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/599062
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6d8426ae59c9c989a38d00ff8c665d5ab129107d
Submitter: Zuul
Branch: stable/rocky

commit 6d8426ae59c9c989a38d00ff8c665d5ab129107d
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 3 16:54:49 2018 -0400

    Fix image-defined numa claims during evacuate

    When evacuating, the API does not send the image_ref
    to the compute so currently the compute manager
    rebuild_instance() method will just pass an empty
    dict for image_meta to the rebuild_claim, which means
    if the server was originally created with an image that
    has numa-related constraints, like hw_numa_nodes, those
    constraints would not be applied to the destination host
    during the evacuate.

    This change simply checks for evacuate if image_ref is
    not provided and pulls the image_meta off the instance
    which was stashed in the instance.system_metadata during
    server create (see get_system_metadata_from_image usage
    in the compute API).

    This fix was ported from the starlingx-staging/stx-nova
    repo commit 71acfeae0.

    Change-Id: If548fa3436174b1eae08cdcf6578020cc0c7b81f
    Closes-Bug: #1785318
    (cherry picked from commit 665ba461f3135857034cf33dd5f427d47fdd155e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.1

This issue was fixed in the openstack/nova 18.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.