Baremetal nodes should not be exposing non-custom-resource-class (vcpu, ram, disk)

Bug #1796920 reported by Belmiro Moreira
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
Pike
In Progress
High
Matt Riedemann
Queens
In Progress
High
Stephen Finucane
Rocky
Fix Committed
High
Matt Riedemann

Bug Description

Description
===========
Baremetal nodes report CPU, RAM and DISK inventory.

The issue is that allocations for baremetal nodes are only done considering the custom_resource_class. This happens because baremetal flavors are set to not consume these resources.
See: https://docs.openstack.org/ironic/queens/install/configure-nova-flavors.html

If we use flavor that doesn't include a custom_resource_class ,
placement can include a baremetal nodee that are already deployed because cpu, ram, disk is available (but results in a error from ironic), or worst the instance is created in a baremetal node (if it wasn't deployed yet).

Environment
===========
Nova and Ironic running Queens release.

Tags: ironic
Revision history for this message
Matt Riedemann (mriedem) wrote :

The code in the ironic virt driver to report VCPU/MEMORY_MB/DISK_GB inventory was removed in Stein:

https://github.com/openstack/nova/commit/a985e34cdeef777fe7ff943e363a5f1be6d991b7

So this bug applies only to rocky/queens/pike.

Once the ironic instance flavor data migration is complete, it is then safe to schedule only based on ironic node custom resource classes. We have a nova-status check that goes back to queens for making sure you've completed the data migration:

https://review.openstack.org/#/q/Ifd22325e849db2353b1b1eedfe998e3d6a79591c

Workarounds for this would be to use host aggregates to segregate VM and BM hosts and pin flavors to those aggregates, or unset the memory_mb/vcpu properties from ironic nodes, but those workarounds might not be feasible at large scale (like CERN).

We can add a workaround config option to nova to disable reporting standard resource class inventory for operators that can't use the other alternative workarounds mentioned above and who know they have done their data migrations.

tags: added: ironic
Changed in nova:
status: New → Triaged
importance: Undecided → High
status: Triaged → Invalid
importance: High → Undecided
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/609043

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/609043
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f7d0a7671fb73983a1853b0d2f3ce7552d752c31
Submitter: Zuul
Branch: stable/rocky

commit f7d0a7671fb73983a1853b0d2f3ce7552d752c31
Author: Matt Riedemann <email address hidden>
Date: Tue Oct 9 11:45:05 2018 -0400

    [stable-only] Add report_ironic_standard_resource_class_inventory option

    Since Pike it has been possible to schedule ironic nodes
    using custom resource classes. As part of that change, existing
    ironic instances needed to undergo a data migration and until
    that data migration was complete, ironic compute services needed
    to continue reporting standard resource class inventory.

    Once the data migration is complete, the problem with continuing
    to report standard resource class inventory is non-baremetal
    flavors can get scheduled to ironic nodes.

    The standard resource class inventory reporting was removed
    from the ironic driver in Stein:

      If2b8c1a76d7dbabbac7bb359c9e572cfed510800

    Therefore as a stable-only workaround, this change adds an
    option for operators to disable reporting standard resource
    class inventory for ironic nodes once they have confirmed that
    their ironic instance data migrations are complete, which they
    can do via the related "nova-status upgrade check" added in
    change Ifd22325e849db2353b1b1eedfe998e3d6a79591c.

    Change-Id: Id3c74c019da29070811ffc368351e2238b3f6da5
    Closes-Bug: #1796920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/620111

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/620113

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.1.0

This issue was fixed in the openstack/nova 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/queens)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/620111
Reason: I'm not actively working on this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/pike)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/pike
Review: https://review.opendev.org/620113
Reason: I'm not actively working on this.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.