evacuate rebuild claim will not use any image_meta so it can miss numa_topology claims

Bug #1785318 reported by Matt Riedemann on 2018-08-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Matt Riedemann
Ocata
Medium
Unassigned
Pike
Medium
Unassigned
Queens
Medium
Unassigned
Rocky
Medium
Matt Riedemann

Bug Description

I found this in the starlingx diff for nova:

https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-afb9c0c0ca5276c7eacd987bbf51d8e6R447

For volume-backed instances, the instance image_meta comes from the volume_image_metadata in the volume for the root bdm, the API figures that out here:

https://github.com/openstack/nova/blob/4c37ff72e5446c835a48d569dd5a1416fcd36c71/nova/compute/api.py#L1099

https://github.com/openstack/nova/blob/4c37ff72e5446c835a48d569dd5a1416fcd36c71/nova/compute/api.py#L1568

Then during an evacuate of a volume-backed instance, the rebuild_claim in the ResourceTracker won't actually get the proper image_meta because of this code in ComputeManager.rebuild_instance:

https://github.com/openstack/nova/blob/4c37ff72e5446c835a48d569dd5a1416fcd36c71/nova/compute/manager.py#L2985

The only thing in the claims code that cares about image_meta is for calculating numa_topology claims:

https://github.com/openstack/nova/blob/4c37ff72e5446c835a48d569dd5a1416fcd36c71/nova/compute/claims.py#L295

I'm not even totally sure if evacuate fully works with an instance using numa topology, but this can't help.

Matt Riedemann (mriedem) wrote :

This is more than just volume-backed instances, it's evacuate in general. The API doesn't pass down the image_ref so we just pass {} to the rebuild_claim during evacuate in all cases, volume-backed or not.

summary: - evacuate rebuild claim will not use any image_meta for volume-backed
- instances
+ evacuate rebuild claim will not use any image_meta so it can miss
+ numa_topology claims

Fix proposed to branch: master
Review: https://review.openstack.org/588657

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Matt Riedemann (mriedem) on 2018-08-03
tags: added: starlingx

Reviewed: https://review.openstack.org/588657
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=665ba461f3135857034cf33dd5f427d47fdd155e
Submitter: Zuul
Branch: master

commit 665ba461f3135857034cf33dd5f427d47fdd155e
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 3 16:54:49 2018 -0400

    Fix image-defined numa claims during evacuate

    When evacuating, the API does not send the image_ref
    to the compute so currently the compute manager
    rebuild_instance() method will just pass an empty
    dict for image_meta to the rebuild_claim, which means
    if the server was originally created with an image that
    has numa-related constraints, like hw_numa_nodes, those
    constraints would not be applied to the destination host
    during the evacuate.

    This change simply checks for evacuate if image_ref is
    not provided and pulls the image_meta off the instance
    which was stashed in the instance.system_metadata during
    server create (see get_system_metadata_from_image usage
    in the compute API).

    This fix was ported from the starlingx-staging/stx-nova
    repo commit 71acfeae0.

    Change-Id: If548fa3436174b1eae08cdcf6578020cc0c7b81f
    Closes-Bug: #1785318

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/599062
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6d8426ae59c9c989a38d00ff8c665d5ab129107d
Submitter: Zuul
Branch: stable/rocky

commit 6d8426ae59c9c989a38d00ff8c665d5ab129107d
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 3 16:54:49 2018 -0400

    Fix image-defined numa claims during evacuate

    When evacuating, the API does not send the image_ref
    to the compute so currently the compute manager
    rebuild_instance() method will just pass an empty
    dict for image_meta to the rebuild_claim, which means
    if the server was originally created with an image that
    has numa-related constraints, like hw_numa_nodes, those
    constraints would not be applied to the destination host
    during the evacuate.

    This change simply checks for evacuate if image_ref is
    not provided and pulls the image_meta off the instance
    which was stashed in the instance.system_metadata during
    server create (see get_system_metadata_from_image usage
    in the compute API).

    This fix was ported from the starlingx-staging/stx-nova
    repo commit 71acfeae0.

    Change-Id: If548fa3436174b1eae08cdcf6578020cc0c7b81f
    Closes-Bug: #1785318
    (cherry picked from commit 665ba461f3135857034cf33dd5f427d47fdd155e)

This issue was fixed in the openstack/nova 18.0.1 release.

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers