_nova_check_type scheduler hint could be accidentally persisted during a rebuild with image change

Bug #1823369 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Tetsuro Nakamura
Rocky
New
Undecided
Unassigned
Stein
New
Undecided
Unassigned

Bug Description

This is based on code inspection and related to bug 1815153 (see comments 1-4) but when we rebuild a server with a new image we go through the scheduler with a special scheduler hint:

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/compute/api.py#L3336

This line is meant to avoid accidentally persisting that change:

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/compute/api.py#L3329

But RequestSpec.save() doesn't use the id field, it looks up the RequestSpec from the DB using the instance_uuid field to save the changes:

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/objects/request_spec.py#L619

Which means we could accidentally persist that scheduler hint here if we are 'healing' a volume-backed server (since Rocky):

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/conductor/manager.py#L1009

The potential fallout from this is that future move operations of that server could only run a subset of the scheduler filters:

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/scheduler/manager.py#L125

And not even call placement...

Revision history for this message
Matt Riedemann (mriedem) wrote :

Also, force_hosts/force_nodes could be accidentally persisted in this case as well:

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/compute/api.py#L3337-L3338

Changed in nova:
importance: Medium → Undecided
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/650376

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/655278

Changed in nova:
assignee: Matt Riedemann (mriedem) → Tetsuro Nakamura (tetsuro0907)
Changed in nova:
assignee: Tetsuro Nakamura (tetsuro0907) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/650376
Reason: I'm no longer working on this.

Matt Riedemann (mriedem)
Changed in nova:
status: In Progress → Triaged
Revision history for this message
Jacolex (jacolex) wrote :

Hello
I think I experienced this bug. I have the instance, which bahovior is very strange during resize. When the rezising occurs, nova scheduler returns all hosts from all availability zones (including downed hosts) and during the instance is migrating to another availability zone.
I spent 3 days to investigate the issue. Finally I found that the instance has different thing in request_specs table:
... "scheduler_hints": {"_nova_check_type": ["rebuild"]}, ...

So I think this is the problem.

Revision history for this message
Matt Riedemann (mriedem) wrote :

@Jacolex, maybe you can test out Tetsuro's fix and see if it resolves your issue:

https://review.opendev.org/#/c/655278/

If it does, please leave a comment saying as such on the patch review.

Changed in nova:
status: Triaged → In Progress
assignee: nobody → Tetsuro Nakamura (tetsuro0907)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.