Baremetal nodes should not be exposing non-custom-resource-class (vcpu, ram, disk)

Bug #1796920 reported by Belmiro Moreira on 2018-10-09
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned
Pike
High
Matt Riedemann
Queens
High
Stephen Finucane
Rocky
High
Matt Riedemann

Bug Description

Description
===========
Baremetal nodes report CPU, RAM and DISK inventory.

The issue is that allocations for baremetal nodes are only done considering the custom_resource_class. This happens because baremetal flavors are set to not consume these resources.
See: https://docs.openstack.org/ironic/queens/install/configure-nova-flavors.html

If we use flavor that doesn't include a custom_resource_class ,
placement can include a baremetal nodee that are already deployed because cpu, ram, disk is available (but results in a error from ironic), or worst the instance is created in a baremetal node (if it wasn't deployed yet).

Environment
===========
Nova and Ironic running Queens release.

Matt Riedemann (mriedem) wrote :

The code in the ironic virt driver to report VCPU/MEMORY_MB/DISK_GB inventory was removed in Stein:

https://github.com/openstack/nova/commit/a985e34cdeef777fe7ff943e363a5f1be6d991b7

So this bug applies only to rocky/queens/pike.

Once the ironic instance flavor data migration is complete, it is then safe to schedule only based on ironic node custom resource classes. We have a nova-status check that goes back to queens for making sure you've completed the data migration:

https://review.openstack.org/#/q/Ifd22325e849db2353b1b1eedfe998e3d6a79591c

Workarounds for this would be to use host aggregates to segregate VM and BM hosts and pin flavors to those aggregates, or unset the memory_mb/vcpu properties from ironic nodes, but those workarounds might not be feasible at large scale (like CERN).

We can add a workaround config option to nova to disable reporting standard resource class inventory for operators that can't use the other alternative workarounds mentioned above and who know they have done their data migrations.

tags: added: ironic
Changed in nova:
status: New → Triaged
importance: Undecided → High
status: Triaged → Invalid
importance: High → Undecided

Reviewed: https://review.openstack.org/609043
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f7d0a7671fb73983a1853b0d2f3ce7552d752c31
Submitter: Zuul
Branch: stable/rocky

commit f7d0a7671fb73983a1853b0d2f3ce7552d752c31
Author: Matt Riedemann <email address hidden>
Date: Tue Oct 9 11:45:05 2018 -0400

    [stable-only] Add report_ironic_standard_resource_class_inventory option

    Since Pike it has been possible to schedule ironic nodes
    using custom resource classes. As part of that change, existing
    ironic instances needed to undergo a data migration and until
    that data migration was complete, ironic compute services needed
    to continue reporting standard resource class inventory.

    Once the data migration is complete, the problem with continuing
    to report standard resource class inventory is non-baremetal
    flavors can get scheduled to ironic nodes.

    The standard resource class inventory reporting was removed
    from the ironic driver in Stein:

      If2b8c1a76d7dbabbac7bb359c9e572cfed510800

    Therefore as a stable-only workaround, this change adds an
    option for operators to disable reporting standard resource
    class inventory for ironic nodes once they have confirmed that
    their ironic instance data migrations are complete, which they
    can do via the related "nova-status upgrade check" added in
    change Ifd22325e849db2353b1b1eedfe998e3d6a79591c.

    Change-Id: Id3c74c019da29070811ffc368351e2238b3f6da5
    Closes-Bug: #1796920

This issue was fixed in the openstack/nova 18.1.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers