old nova instances cant be started on post victoria deployments

Bug #2080556 reported by sean mooney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
sean mooney

Bug Description

Downstream we had an interesting but report https://bugzilla.redhat.com/show_bug.cgi?id=2311875

Instances created after liberty but before victoria
that request a numa topology but do not have CPU pinning
cannot be started on post victoria nova.

as part of the
https://specs.openstack.org/openstack/nova-specs/specs/train/implemented/cpu-resources.html
spec we started tracking cpus as PCVU and VCPU resource classes but since a given instance
would either have pinned cpus or floating cpus no changes too the instance numa topology object
were required.

with the introduction of mixed cpus in a single instnace

https://specs.openstack.org/openstack/nova-specs/specs/victoria/implemented/use-pcpu-vcpu-in-one-instance.html

the instnace numa topology object was extended with a new pcpuset field.

as part of that work the _migrate_legacy_object function was extended to default pcpuset to an empty set
https://github.com/openstack/nova/commit/867d4471013bf6a70cd3e9e809daf80ea358df92#diff-ed76deb872002cf64931c6d3f2d5967396240dddcb93da85f11886afc7dc4333R212
for numa topologies that predate ovo

and

an new _migrate_legacy_dedicated_instance_cpuset function was added to migrate existing pinned instances and instnace with ovo in the db.

what we missed in the review is that unpinned guests should have had the cell.pcpuset set to the empty set
here
https://github.com/openstack/nova/commit/867d4471013bf6a70cd3e9e809daf80ea358df92#diff-ed76deb872002cf64931c6d3f2d5967396240dddcb93da85f11886afc7dc4333R178

The new filed is not nullable and is not present in the existing json serialised object
as a result accessing cell.pcpuset on object returned form the db will raise a NotImplementedError because it is unset if the VM was created between liberty and victoria.
this only applies to non-pinned vms with a numa topology i.e.
hw:mem_page_size=<anything> or hw:numa_nodes=<anything>

Tags: numa
Revision history for this message
sean mooney (sean-k-mooney) wrote :

we could argue if this is high or median because its been sooooo long since this regression was introduced and it does not impact newly created vms on anything Victoria or later.

but for people that have an older cloud this bug still technically exits in master if you are moving from something ancient to more recent.

Changed in nova:
assignee: nobody → sean mooney (sean-k-mooney)
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/929187

summary: - old nova instance cant be started on post victoria deployments
+ old nova instances cant be started on post victoria deployments
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/929025
Committed: https://opendev.org/openstack/nova/commit/521db4a4353ac884252270cba226034e01062781
Submitter: "Zuul (22348)"
Branch: master

commit 521db4a4353ac884252270cba226034e01062781
Author: Sean Mooney <email address hidden>
Date: Thu Sep 12 13:47:30 2024 +0100

    repoduce post liberty pre vicoria instance numa db issue

    This change reproduces a bug in the db load of old
    instance numa toplogy json blobs in
    _migrate_legacy_dedicated_instance_cpuset
    that failed to account for defaulting pcpuset to
    the empty set when it is not in the json blob.

    Related-Bug: #2080556
    Change-Id: Ia0f327c501f65786d5b2538b2742ec2786486956

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/929187
Committed: https://opendev.org/openstack/nova/commit/2a870323c3d44d2056b326c184c435a484513532
Submitter: "Zuul (22348)"
Branch: master

commit 2a870323c3d44d2056b326c184c435a484513532
Author: Sean Mooney <email address hidden>
Date: Thu Sep 12 21:05:54 2024 +0100

    allow upgrade of pre-victoria InstanceNUMACells

    This change ensures that if we are upgrading a
    InstanceNUMACell object created before victoria
    <1.5 that we properly set pcpuset=set() when
    loading the object form the db.

    This is requried to support instances with a numa
    topology that do not use cpu pinning.

    Depends-On: https://review.opendev.org/c/openstack/python-openstackclient/+/929236
    Closes-Bug: #2080556
    Change-Id: Iea55aabe71c250d8c8e93c61421450b909a7fa3d

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 30.0.0.0rc1

This issue was fixed in the openstack/nova 30.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/2024.1)

Related fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/nova/+/932064

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/nova/+/932065

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.