Migration of a one VM deployed as part of group fails with NoValidHost

Bug #1680773 reported by Arun Mani
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Arun Mani

Bug Description

Unable to migrate a VM that was originally deployed as a part of multi-vm deploy request. (eg set number of instances to greater than 1 in the UI/REST)

Steps to reproduce:
- Set up the controller and register compute nodes
- Now, try a multi-deploy of VMs
- Once the deploy is successful, try to migrate(untargeted migration) one of the VM deployed as part of group(group here means, attempting to deploy several VMs with a single request and NOT a server group.)
- The operation will fail with NoValidHost error at the scheduler

The issue here is that the request spec the scheduler is getting during migration has num_instances greater than 1(or however many were initially deployed). This is expected on the initial deploy but is not expected on the later migration.

The problem seems to be related to nova.compute.api._provision_instances().
In mitaka it was:
                req_spec = objects.RequestSpec.from_components(context,
                        instance_uuid, boot_meta, instance_type,
                        base_options['numa_topology'],
                        base_options['pci_requests'], filter_properties,
                        instance_group, base_options['availability_zone'])
                req_spec.create()
In ocata it is:
                req_spec = objects.RequestSpec.from_components(context,
                        instance_uuid, boot_meta, instance_type,
                        base_options['numa_topology'],
                        base_options['pci_requests'], filter_properties,
                        instance_group, base_options['availability_zone'],
                        security_groups=security_groups)
                # NOTE(danms): We need to record num_instances on the request
                # spec as this is how the conductor knows how many were in this batch.
                req_spec.num_instances = num_instances
                req_spec.create()

In mitaka, on deploy...the RequestSpec was saved to the db and then the num_instances was set to the current object on the fly based on len(num_instances). So on deploy, the scheduler gets an object with num_instances equal to the number deployed, but what got saved in the db was the default value 1. On later migrations, when the new RequestSpec object is created from the db information the object has the default 1 value.
   Now in ocata, the local object's num_instances is updated and then the db object is created/saved. This means the db's copy also has the larger value. When a migration is attempted on one of the VM, the new RequestSpec object created for the migration also shows this larger value causing the migration to fail at scheduler.

Arun Mani (arun-mani)
Changed in nova:
assignee: nobody → Arun Mani (arun-mani)
Arun Mani (arun-mani)
description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :

What group policy did you use when creating the group? affinity? anti-affinity? other?

Revision history for this message
Matt Riedemann (mriedem) wrote :

(9:02:12 AM) gibi: mriedem: migrating affinty group is not possible today. you will get no valid host. you can evacuate an affinity group with --force flag

Revision history for this message
Matt Riedemann (mriedem) wrote :

Assuming the server group was created with an 'affinity' policy, this is working as designed, or rather, it's a limitation that Nova does not currently support migrating affined instances in a group on the same host, since that would violate the affinity policy.

At this point the best we can do is document the limitation in the API reference.

If your server group is anti-affinity or some other setup, please describe in more detail.

Changed in nova:
status: New → Incomplete
Revision history for this message
Matt Riedemann (mriedem) wrote :
Arun Mani (arun-mani)
Changed in nova:
status: Incomplete → In Progress
Revision history for this message
Arun Mani (arun-mani) wrote :

@mriedem, I've updated the description with more details on the problem. Hope it helps in understanding the problem better. In any case, this isn't related to server group.

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/461213

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Stephen Finucane (<email address hidden>) on branch: master
Review: https://review.openstack.org/461213

Revision history for this message
Matt Riedemann (mriedem) wrote :

This has been fixed awhile ago, I need to find the duplicate bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.