Ironic flavor migration and default resource classes

Bug #1816034 reported by Belmiro Moreira on 2019-02-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Matt Riedemann
Rocky
High
Unassigned

Bug Description

The Ironic flavor migration to use resource classes happened in Pike/Queens.

The flavors and the instances needed to be upgraded with the correct resource class.
This was done by an online data migration.
Looking into Rocky code: ironic.driver._pike_flavor_migration
There is also an offline data migration using nova-manage.

These migrations added the node resource class into instance_extra.flavor however I don't see that they also included the default resource classes (VCPU, MEMORY_MB, DISK_GB) set to 0.

Looking into Rocky code there is also a TODO in _pike_flavor_migration:
"This code can be removed in Queens, and will need to be updated to also alter extra_specs to zero-out the old-style standard resource classes of VCPU, MEMORY_MB, and DISK_GB."

Currently all my Ironic instances have the correct node resource class defined, but "old" instances (created before the flavor migration) don't have VCPU, MEMORY_MB, DISK_GB set to 0, in instance_extra.flavor.

In Rocky the resource tracker raises the following message:
"There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Inventory for 'VCPU' on resource provider XXXX invalid. ", "title": "Conflict"

because it tries to update the allocation but the inventory doesn't have vcpu resources.

---
As mitigation we now have: "requires_allocation_refresh = False" in the Ironic Driver.

tags: added: compute ironic resource-tracker
Matt Riedemann (mriedem) wrote :

Are you setting [workarounds]/report_ironic_standard_resource_class_inventory=False? I assume so since you said you migrated your flavors and nodes so that scheduling for those baremetal nodes should only use the custom resource class from the ironic node (and flavor).

If so, then the IronicDriver.requires_allocation_refresh variable should probably be changed to:

IronicDriver.requires_allocation_refresh = not CONF.workarounds.report_ironic_standard_resource_class_inventory

Matt Riedemann (mriedem) wrote :

(8:38:42 AM) mriedem: belmoreira: and you have this enabled yes? https://review.openstack.org/#/c/609043/
(8:38:53 AM) mriedem: which is probably why the ironic driver isn't reporting standard resource class inventory
(8:44:45 AM) mriedem: i wonder if we should just set IronicDriver.requires_allocation_refresh = not CONF.workarounds.report_ironic_standard_resource_class_inventory
(8:45:11 AM) mriedem: because if we're not going to report the standard inventory we shouldn't try to put allocations for those standard inventory resource classes
(8:46:23 AM) openstackgerrit: Merged openstack/nova master: Libvirt: do not set MAC when unplugging macvtap VF https://review.openstack.org/624842
(8:46:34 AM) belmoreira: mriedem: is set to False

Matt Riedemann (mriedem) wrote :

Correction:

IronicDriver.requires_allocation_refresh = CONF.workarounds.report_ironic_standard_resource_class_inventory

Changed in nova:
status: New → Triaged
Matt Riedemann (mriedem) wrote :

Maybe that won't work because then old instance allocations won't be reported to placement which would re-introduce bug 1724589:

(8:54:14 AM) belmoreira: mriedem: we are fixing this in our instances_extra because we will need to recreate the allocations/RP. To make sure RP will not not change uuid in a sharded nova-compute setup with ironic
(8:54:36 AM) belmoreira: otherwise allocations will not be recreated
(8:54:51 AM) mriedem: you mean for old instances
(8:54:58 AM) mriedem: otherwise the scheduler would create the allocations
(8:55:14 AM) belmoreira: yes, for old instances

Matt Riedemann (mriedem) on 2019-02-15
Changed in nova:
importance: Undecided → High

Fix proposed to branch: master
Review: https://review.openstack.org/637217

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Matt Riedemann (mriedem) wrote :

I've removed Pike and Queens from this since those are lower priority since https://review.openstack.org/#/q/Id3c74c019da29070811ffc368351e2238b3f6da5 hasn't landed on those branches.

no longer affects: nova/queens
no longer affects: nova/pike
Matt Riedemann (mriedem) on 2019-02-15
Changed in nova:
importance: High → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers