Requested host during cold migrate is ignored if server created before Rocky

Bug #1815153 reported by Matt Riedemann
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Takashi Natsume
Rocky
Fix Committed
High
Takashi Natsume

Bug Description

I stumbled across this during a failing functional test:

https://review.openstack.org/#/c/635668/2/nova/conductor/tasks/migrate.py@263

In Rocky, new RequestSpec objects have the is_bfv field set, but change https://review.openstack.org/#/c/583715/ was added to 'heal' old RequestSpecs when servers created before Rocky are migrated (cold migrate, live migrate, unshelve and evacuate).

The problem is change https://review.openstack.org/#/c/610098/ made the RequestSpec.save() operation stop persisting the requested_destination field, which means when heal_reqspec_is_bfv saves the is_bfv change to the RequestSpec, the requested_destination is lost and the user-specified target host is not honored (this would impact all move APIs that target a target host, so cold migrate, live migrate and evacuate).

The simple way to fix it is by not overwriting the set requested_destination field during save (don't persist it in the database, but don't reset it to None in the object in memory):

https://review.openstack.org/#/c/635668/2/nova/objects/request_spec.py@517

This could also be a problem for the 'network_metadata' field added in Rocky:

https://review.openstack.org/#/c/564442/

Tags: migration
Changed in nova:
assignee: nobody → Takashi NATSUME (natsume-takashi)
Changed in nova:
status: Triaged → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm not sure but another case this could affect is rebuilding with a new image, in which case the API sets the new image on the request spec:

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/compute/api.py#L3323

Although I guess we intentionally save that change so it's probably OK. Plus this:

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/compute/api.py#L3329

Should make any save() later fail - although that's probably worth investigating if the request spec is old and would blow up on save() here:

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/conductor/manager.py#L1009

Revision history for this message
Matt Riedemann (mriedem) wrote :

Since I found this only in a functional test, I think we probably want functional tests for everywhere this could be a problem, which I think is:

1. forced live migration to a specific host
2. forced cold migration to a specific host
3. forced evacuate to a specific host
4. rebuild with a new image

I don't see anything obvious in the unshelve flow that could break because of this.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I made this change to try and test scenario #4 but the test still passed which was unexpected:

diff --git a/nova/tests/functional/test_servers.py b/nova/tests/functional/test_servers.py
index 67c0fe9..c73d6a2 100644
--- a/nova/tests/functional/test_servers.py
+++ b/nova/tests/functional/test_servers.py
@@ -1409,6 +1409,19 @@ class ServerRebuildTestCase(integrated_helpers._IntegratedTestBase,
         allocs = allocs[rp_uuid]['resources']
         assertFlavorMatchesAllocation(flavor, allocs)

+ # Now let's hack the RequestSpec.is_bfv field to mimic migrating an
+ # old instance created before RequestSpec.is_bfv was set in the API,
+ # move the instance and verify that the RequestSpec.is_bfv is set
+ # and the instance still reports the same DISK_GB allocations as during
+ # the initial create.
+ ctxt = context.get_admin_context()
+ reqspec = objects.RequestSpec.get_by_instance_uuid(ctxt, server['id'])
+ del reqspec.is_bfv
+ reqspec.save()
+ reqspec = objects.RequestSpec.get_by_instance_uuid(ctxt, server['id'])
+ # Make sure it's not set.
+ self.assertNotIn('is_bfv', reqspec)
+
         rebuild_image_ref = (
             nova.tests.unit.image.fake.AUTO_DISK_CONFIG_ENABLED_IMAGE_UUID)
         # Now rebuild the server with a different image.
osboxes@osboxes:~/git/nova$

{0} nova.tests.functional.test_servers.ServerRebuildTestCase.test_rebuild_with_new_image [8.071310s] ... ok

Revision history for this message
Matt Riedemann (mriedem) wrote :

OK I see why rebuild + new image + old request spec doesn't blow up on save() in heal_reqspec_is_bfv, the RequestSpec.save() routine doesn't rely on the id:

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/objects/request_spec.py#L619

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/636271

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/636271
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=67d5970445818f2f245cf1b6d9d46c36fb220f04
Submitter: Zuul
Branch: master

commit 67d5970445818f2f245cf1b6d9d46c36fb220f04
Author: Takashi Natsume <email address hidden>
Date: Tue Feb 12 11:46:57 2019 +0900

    Fix resetting non-persistent fields when saving obj

    The 'requested_destination', 'network_metadata', 'retry' fields
    in the RequestSpec object are reset when saving the object currently.

    When cold migrating a server, the API sets the requested_destination
    so conductor will pass that information to the scheduler
    to restrict the cold migration to that host.
    But the 'heal_reqspec_is_bfv' method called from the conductor
    makes an update to the RequestSpec which resets
    the requested_destination so the server could end up being cold migrated
    to some other host than the one that was requested by the API user.

    So make them not be reset when saving the object.

    Change-Id: I2131558f0edfe603ee1e8d8bae66a3caf5182a58
    Closes-Bug: #1815153

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/641200

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/643215

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/643215
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=02d6146e4d8b5af2aae0d046a58598aa0102f53d
Submitter: Zuul
Branch: stable/rocky

commit 02d6146e4d8b5af2aae0d046a58598aa0102f53d
Author: Takashi Natsume <email address hidden>
Date: Tue Feb 12 11:46:57 2019 +0900

    Fix resetting non-persistent fields when saving obj

    The 'requested_destination', 'network_metadata', 'retry' fields
    in the RequestSpec object are reset when saving the object currently.

    When cold migrating a server, the API sets the requested_destination
    so conductor will pass that information to the scheduler
    to restrict the cold migration to that host.
    But the 'heal_reqspec_is_bfv' method called from the conductor
    makes an update to the RequestSpec which resets
    the requested_destination so the server could end up being cold migrated
    to some other host than the one that was requested by the API user.

    So make them not be reset when saving the object.

    Conflicts:
        nova/objects/request_spec.py
        nova/tests/unit/objects/test_request_spec.py

        Conflicts are due to not including the following changes in rocky.

        I53e5debcffd6de2b3a2ff838e7f5da33fa1a82b8
        Idaed39629095f86d24a54334c699a26c218c6593

    Change-Id: I2131558f0edfe603ee1e8d8bae66a3caf5182a58
    Closes-Bug: #1815153
    (cherry picked from commit 67d5970445818f2f245cf1b6d9d46c36fb220f04)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.0

This issue was fixed in the openstack/nova 18.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.