live migration does not honor server group policy

Bug #1600251 reported by Paul Carlton on 2016-07-08
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Unassigned

Bug Description

What happens is that the live migration task uses the resource specification created when the instances was created and passes this to the scheduler to find a new host, marking its current host as excluded. This resource spec object includes the instance's group object which contains a list of instances in the group. The problem is that the instance group object in the resource spec reflects the list of instances in the group at the time the instance was created. Thus if you migrate the first instance to be assigned an anti affinity group it will think that the group has no other member instances and thus no compute nodes will be excluded. Only the most recently created instance assigned the anti affinity group will correctly exclude all nodes containing members of its group!

There is code to update the instance group object in the resource spec but the resource spec object is only updated with this information if it is created by the live migration task, i.e. in the case of an instance without a resource spec in the request_specs database table. This will only be the case for instances created prior to the implementation of the requests_specs table.

Changed in nova:
assignee: nobody → Paul Carlton (paul-carlton2)

Fix proposed to branch: master
Review: https://review.openstack.org/339588

Changed in nova:
status: New → In Progress
Paul Murray (pmurray) wrote :

To reproduce:

With two compute hosts:
1. create an instance group X with the anti-affinity policy
2. boot a vm A with --hint group=X
3. boot a vm B with --hint group=X

Now there should be one vm on each host

4. live migrating vm A works (the anti-affinity policy is not respected)
5. live migrating vm B does not work (the anti-affinity policy is respected)

Confirmed that policies are honored only for 1 VM (the newest VM in environment) by using the same steps to reproduce this issue.

Changed in nova:
importance: Undecided → High
tags: added: affinity anti-affinity live-migration mitaka-backport-potential
Sylvain Bauza (sylvain-bauza) wrote :

So, I thought I found why the bug was there because we were not persisting the instance group information when creating the RequestSpec but that's not right, we're correctly setting the field instead.

Now, I'm trying to understand why we have those problems and why calling setup_instance_group() could help us given that InstanceGroup.hosts is a lazy-load field.

Paul Murray (pmurray) wrote :

We re-tested the case where the request spec has not been persisted and confirm that the behaviour is correct. So the bug only manifests if the request spec has been persisted.

Change abandoned by Paul Carlton (<email address hidden>) on branch: master
Review: https://review.openstack.org/339588
Reason: superseded by https://review.openstack.org/#/c/344380/

Paul Carlton (paul-carlton2) wrote :

Change of plan, https://review.openstack.org/339588 restored

Paul Murray (pmurray) on 2016-09-07
tags: added: newton-rc-potential
Matt Riedemann (mriedem) wrote :

If this is a latent bug and is backport potential for mitaka then I don't think we need to hold up the newton release for this.

tags: removed: newton-rc-potential
Changed in nova:
assignee: Paul Carlton (paul-carlton2) → Pawel Koniszewski (pawel-koniszewski)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/339588
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in nova:
status: In Progress → New
assignee: Pawel Koniszewski (pawel-koniszewski) → nobody
Sean Dague (sdague) on 2017-08-01
Changed in nova:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers