Comment 4 for bug 1830747

Revision history for this message
Matt Riedemann (mriedem) wrote :

This might explain what's happening during a cold migration.

Conductor creates a legacy filter_properties dict here:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/migrate.py#L172

If the spec has an instance_group it will call here:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L397

and _to_legacy_group_info sets these values in the filter_properties dict:

        return {'group_updated': True,
                'group_hosts': set(self.instance_group.hosts),
                'group_policies': set([self.instance_group.policy]),
                'group_members': set(self.instance_group.members)}

Note there is no group_uuid.

Those filter_properties are passed to the prep_resize method on the dest compute:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/migrate.py#L304

zigo said he hit this:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4272

(10:03:07 AM) zigo: 2019-05-28 15:02:35.534 30706 ERROR nova.compute.manager [instance: ae6f8afe-9c64-4aaf-90e8-be8175fee8e4] nova.exception.UnableToMigrateToSelf: Unable to migrate instance (ae6f8afe-9c64-4aaf-90e8-be8175fee8e4) to current host (clint1-compute-5.infomaniak.ch).

which will trigger a reschedule here:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4348

The _reschedule_resize_or_reraise method will setup the parameters for the resize_instance compute task RPC API (conductor) method:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4378-L4379

Note that in Rocky the RequestSpec is not passed back to conductor on the reschedule, only the filter_properties:

https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L1452

We only started passing the RequestSpec from compute to conductor on reschedule starting in Stein: https://review.opendev.org/#/c/582417/

Without the request spec we get here in conductor:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L307

Note that was pass in the filter_properties but no instance_group to RequestSpec.from_components.

And because there is no instance_group but there are filter_properties, we call _populate_group_info here:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L442

Which means we get into this block that sets the RequestSpec.instance_group with no uuid:

https://github.com/openstack/nova/blob/stable/rocky/nova/objects/request_spec.py#L228

Then we eventually RPC cast off to prep_resize on the next host to try for the cold migration and save the request_spec changes here:

https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L356

Which is how later attempts to use that request spec to migrate the instance blow up when loading it from the DB because spec.instance_group.uuid is not set.