Migration fails with "NotImplementedError: Cannot load 'pcpuset' in the base class" when a pre Victoria instance with cpu pinning is migrated in Victoria

Bug #1952941 reported by Balazs Gibizer
16
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Balazs Gibizer
Victoria
In Progress
Medium
Balazs Gibizer
Wallaby
Fix Released
Medium
Balazs Gibizer
Xena
Fix Released
Medium
Balazs Gibizer

Bug Description

When the cpuset -> pcpuset data migration was added to InstanceNUMATopology [1] it was missed that such object is not only hydrated via InstanceNUMATopology.get_by_instance_uuid() but also hydrated by RequestSpec.get_by_instance_uuid() indirectly. However the latter code patch does not call InstanceNUMATopology.obj_from_db_obj() that triggers the data migration via InstanceNUMATopology._migrate_legacy_dedicated_instance_cpuset. This causes that when the new nova code loads an old RequestSpec object from the DB (e.g. during migration of an instance) the InstanceNUMATopology in the RequestSpec will not be migrated to the new object version and it will lead to errors when the pcpuset field is read during scheduling.

To reproduce:
* Install a pre Victoria cloud
* Create an instance with cpu pinning
* Upgrade to Victoria or newer
* Try to migrate / evacuate the instance

You will see the following stack trace in the nova-scheduler log

2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 241, in inner
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server return func(*args, **kwargs)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/scheduler/manager.py", line 215, in select_destinations
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server allocation_request_version, return_alternates)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py", line 96, in select_destinations
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server allocation_request_version, return_alternates)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py", line 210, in _schedule
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server hosts = self._get_sorted_hosts(spec_obj, hosts, num)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py", line 441, in _get_sorted_hosts
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server spec_obj, index)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/scheduler/host_manager.py", line 606, in get_filtered_hosts
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server hosts, spec_obj, index)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/filters.py", line 88, in get_filtered_objects
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server list_objs = list(objs)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/filters.py", line 43, in filter_all
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server if self._filter_one(obj, spec_obj):
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/scheduler/filters/__init__.py", line 44, in _filter_one
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server return self.host_passes(obj, spec)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/scheduler/filters/numa_topology_filter.py", line 104, in host_passes
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server pci_stats=host_state.pci_stats))
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/hardware.py", line 2294, in numa_fit_instance_to_host
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server host_cell, instance_cell, limits, cpuset_reserved)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/hardware.py", line 1109, in _numa_fit_instance_cell
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server required_cpus = len(instance_cell.pcpuset) + cpuset_reserved
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 67, in getter
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server self.obj_load_attr(name)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 601, in obj_load_attr
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server _("Cannot load '%s' in the base class") % attrname)
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server NotImplementedError: Cannot load 'pcpuset' in the base class
2021-11-30 17:36:38.963 48 ERROR oslo_messaging.rpc.server

[1] https://review.opendev.org/c/openstack/nova/+/714658

summary: - NotImplementedError: Cannot load 'pcpuset' in the base class if a pre
- Victoria instance with cpu pinning is migrated in Victoria
+ Migration fails with "NotImplementedError: Cannot load 'pcpuset' in the
+ base class" when a pre Victoria instance with cpu pinning is migrated in
+ Victoria
tags: added: upgrade
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/820121

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/820153

Changed in nova:
status: New → In Progress
Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
importance: Undecided → Medium
tags: added: numa scheduler
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/820121
Committed: https://opendev.org/openstack/nova/commit/05e8977cb2fd660dcf6af32c5804f7548bf722a7
Submitter: "Zuul (22348)"
Branch: master

commit 05e8977cb2fd660dcf6af32c5804f7548bf722a7
Author: Balazs Gibizer <email address hidden>
Date: Wed Dec 1 18:33:32 2021 +0100

    Reproduce bug 1952941

    The added unit test proves that pre-Victoria RequestSpec objects
    describing a cpu pinned Instance are not migrated to a proper format
    in Victoria or newer.

    Related-Bug: #1952941

    Change-Id: I672af45a1d1c7fb428b1c4983d4f856852829fb9

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/nova/+/820153
Committed: https://opendev.org/openstack/nova/commit/e853bb57181721725a89656b3cb3058636630a6e
Submitter: "Zuul (22348)"
Branch: master

commit e853bb57181721725a89656b3cb3058636630a6e
Author: Balazs Gibizer <email address hidden>
Date: Thu Dec 2 12:52:01 2021 +0100

    Migrate RequestSpec.numa_topology to use pcpuset

    When the InstanceNUMATopology OVO has changed in
    I901fbd7df00e45196395ff4c69e7b8aa3359edf6 to separately track
    pcpus from vcpus a data migration was added. This data migration is
    triggered when the InstanceNUMATopology object is loaded from the
    instance_extra table. However that patch is missed the fact that the
    InstanceNUMATopology object can be loaded from the request_spec table as
    well. So InstanceNUMATopology object in RequestSpec are not migrated.
    This could lead to errors in the scheduler when such RequestSpec object
    is used for scheduling (e.g. during a migration of a pre Victoria
    instance with cpu pinning)

    This patch adds the missing data migration.

    Change-Id: I812d720555bdf008c83cae3d81541a37bd99e594
    Closes-Bug: #1952941

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/827868

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/827869

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/827870

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/827871

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/827872

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/827873

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/827868
Committed: https://opendev.org/openstack/nova/commit/d860615527a4f492251c363b2092674d55228a40
Submitter: "Zuul (22348)"
Branch: stable/xena

commit d860615527a4f492251c363b2092674d55228a40
Author: Balazs Gibizer <email address hidden>
Date: Wed Dec 1 18:33:32 2021 +0100

    Reproduce bug 1952941

    The added unit test proves that pre-Victoria RequestSpec objects
    describing a cpu pinned Instance are not migrated to a proper format
    in Victoria or newer.

    Related-Bug: #1952941

    Change-Id: I672af45a1d1c7fb428b1c4983d4f856852829fb9
    (cherry picked from commit 05e8977cb2fd660dcf6af32c5804f7548bf722a7)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/nova/+/827869
Committed: https://opendev.org/openstack/nova/commit/7f6ec8cf546cf8f437ee94bb2308447427f54ada
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 7f6ec8cf546cf8f437ee94bb2308447427f54ada
Author: Balazs Gibizer <email address hidden>
Date: Thu Dec 2 12:52:01 2021 +0100

    Migrate RequestSpec.numa_topology to use pcpuset

    When the InstanceNUMATopology OVO has changed in
    I901fbd7df00e45196395ff4c69e7b8aa3359edf6 to separately track
    pcpus from vcpus a data migration was added. This data migration is
    triggered when the InstanceNUMATopology object is loaded from the
    instance_extra table. However that patch is missed the fact that the
    InstanceNUMATopology object can be loaded from the request_spec table as
    well. So InstanceNUMATopology object in RequestSpec are not migrated.
    This could lead to errors in the scheduler when such RequestSpec object
    is used for scheduling (e.g. during a migration of a pre Victoria
    instance with cpu pinning)

    This patch adds the missing data migration.

    Change-Id: I812d720555bdf008c83cae3d81541a37bd99e594
    Closes-Bug: #1952941
    (cherry picked from commit e853bb57181721725a89656b3cb3058636630a6e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/827870
Committed: https://opendev.org/openstack/nova/commit/b190c30f005bd1b1a7e3d9bf35648e65ff472f02
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit b190c30f005bd1b1a7e3d9bf35648e65ff472f02
Author: Balazs Gibizer <email address hidden>
Date: Wed Dec 1 18:33:32 2021 +0100

    Reproduce bug 1952941

    The added unit test proves that pre-Victoria RequestSpec objects
    describing a cpu pinned Instance are not migrated to a proper format
    in Victoria or newer.

    Related-Bug: #1952941

    Change-Id: I672af45a1d1c7fb428b1c4983d4f856852829fb9
    (cherry picked from commit 05e8977cb2fd660dcf6af32c5804f7548bf722a7)
    (cherry picked from commit d860615527a4f492251c363b2092674d55228a40)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/nova/+/827871
Committed: https://opendev.org/openstack/nova/commit/dad566614c92841aba65d3f0a69e0c580457cb46
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit dad566614c92841aba65d3f0a69e0c580457cb46
Author: Balazs Gibizer <email address hidden>
Date: Thu Dec 2 12:52:01 2021 +0100

    Migrate RequestSpec.numa_topology to use pcpuset

    When the InstanceNUMATopology OVO has changed in
    I901fbd7df00e45196395ff4c69e7b8aa3359edf6 to separately track
    pcpus from vcpus a data migration was added. This data migration is
    triggered when the InstanceNUMATopology object is loaded from the
    instance_extra table. However that patch is missed the fact that the
    InstanceNUMATopology object can be loaded from the request_spec table as
    well. So InstanceNUMATopology object in RequestSpec are not migrated.
    This could lead to errors in the scheduler when such RequestSpec object
    is used for scheduling (e.g. during a migration of a pre Victoria
    instance with cpu pinning)

    This patch adds the missing data migration.

    Change-Id: I812d720555bdf008c83cae3d81541a37bd99e594
    Closes-Bug: #1952941
    (cherry picked from commit e853bb57181721725a89656b3cb3058636630a6e)
    (cherry picked from commit 7f6ec8cf546cf8f437ee94bb2308447427f54ada)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.2.0

This issue was fixed in the openstack/nova 23.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.1.0

This issue was fixed in the openstack/nova 24.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.0.0.0rc1

This issue was fixed in the openstack/nova 25.0.0.0rc1 release candidate.

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

This seems to be still happening when start instance created on Ussuri, in Yoga ( upgraded )

https://pastebin.ubuntu.com/p/rXwP6b85xc/

Could somebody give me any advice?

Thanks.

tags: added: sts
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/827873
Reason: stable/victoria branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/victoria if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/827872
Reason: stable/victoria branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/victoria if you want to further work on this patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.