live migration break the anti-affinity policy of server group simultaneously

Bug #1821755 reported by Boxiang Zhu on 2019-03-26
46
This bug affects 8 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Boxiang Zhu
Victoria
Undecided
Unassigned
Wallaby
Undecided
Unassigned

Bug Description

Description
===========
If we live migrate two instance simultaneously, the instances will break the instance group policy.

Steps to reproduce
==================
OpenStack env with three compute nodes(node1, node2 and node3). Then we create two VMs(vm1, vm2) with the anti-affinity policy.
At last, we live migrate two VMs simultaneously.

Before live-migration, the VMs are located as followed:
node1 -> vm1
node2 -> vm2
node3

* nova live-migration vm1
* nova live-migration vm2

Expected result
===============
Fail to live migrate vm1 and vm2.

Actual result
=============
node1
node2
node3 -> vm1,vm2

Environment
===========
master branch of openstack

As described above, the live migration could not check the in-progress live-migration and just select the host by scheduler filter. So that they are migrated to the same host.

Boxiang Zhu (bxzhu-5355) on 2019-03-26
description: updated
Matt Riedemann (mriedem) on 2019-03-27
tags: added: live-migration scheduler
Revision history for this message
Matt Riedemann (mriedem) wrote :

This is a long-standing known issue I believe, same for server build and evacuate (evacuate was fixed later in Rocky I think). There is a late affinity check in the compute service to check for the race in the scheduler and then reschedule for server create to another host, or fail in the case of evacuate. There is no such late affinity check for other move operations like live migration, cold migration (resize) or unshelve.

I believe StarlingX's nova fork has some server group checks in the live migration task though, so maybe those fixes could be 'upstreamed' to nova:

https://github.com/starlingx-staging/stx-nova/blob/3155137b8a0f00cfdc534e428037e1a06e98b871/nova/conductor/tasks/live_migrate.py#L88

Looking at that StarlingX code, they basically check to see if the server being live migrated is in an anti-affinity group and if so they restrict scheduling via external lock to one live migration at a time, which might be OK in a small edge node with 1-2 compute nodes but would be pretty severe in a large public cloud with lots of concurrent live migrations. Granted it's only the scheduling portion of the live migration task, not the actual live migration of the guest itself once a target host is selected. I'm also not sure if that external lock would be sufficient if you have multiple nova-conductors running on different hosts unless you were using a distributed lock manager like etcd, which nova upstream does not use (I'm not sure if oslo.concurrency can be configured for etcd under the covers or not).

Long-term this should all be resolved with placement when we can model affinity and anti-affinity in the placement service.

tags: added: starlingx
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Chris Friesen (cbf123) wrote :

Just to add a comment...I can confirm the external lock would not be sufficient if you have nova-conductor services running on multiple physical hosts.

Revision history for this message
Boxiang Zhu (bxzhu-5355) wrote :

Hi Matt

To summarize your comments, I think there are three ways to fix this issue:
1. The same way like build or evacuate(both of them were fixed before Stein) to make a late affinity check in compute service to check for the race in scheduler and the re-scheduler or fail at last. [1] we can also add the validate function for other move operations.
2. Like the starlingX codes, to add the external lock(maybe replace to use distributed lock like Tooz library in cinder which is not used in nova).
3. For long-term, model affinity and anti-affinity in placement service.

For short-term, I'd like to choose the first way to fix it. How about it?

[1] https://github.com/openstack/nova/blob/stable/stein/nova/compute/manager.py#L1358-L1411

Revision history for this message
Matt Riedemann (mriedem) wrote :

I agree that option #1 (late affinity check in compute - probably during ComputeManager.pre_live_migration) is the easiest way to go, but could still potentially be racy although it should (for the most part anyway) solve a scheduling race where concurrent live migration requests are made and there are multiple schedulers running which pick the same host for servers in an anti-affinity group.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/651969

Changed in nova:
assignee: nobody → Boxiang Zhu (bxzhu-5355)
status: Triaged → In Progress
Revision history for this message
Tomi Juvonen (tomi-juvonen-q) wrote :

Well also if you do live migration to host where an instance of same anti-affinity group member was live_migrated away within ~70sec, it will fail as anti-affinity filter still thinks there is an instance of the same group present. I have had a manual fix in the code for a couple of years to have updated information in AntiAffinity filter code straight from DB. What seems to work also is SIGHUP to nova-scheduler. So some parallel migration checking might not be enough if the information used is not up-to-date.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/784166
Committed: https://opendev.org/openstack/nova/commit/33c8af1f8c46c9c37fcc28fb3409fbd3a78ae39f
Submitter: "Zuul (22348)"
Branch: master

commit 33c8af1f8c46c9c37fcc28fb3409fbd3a78ae39f
Author: Rodrigo Barbieri <email address hidden>
Date: Wed Mar 31 11:06:49 2021 -0300

    Error anti-affinity violation on migrations

    Error-out the migrations (cold and live) whenever the
    anti-affinity policy is violated. This addresses
    violations when multiple concurrent migrations are
    requested.

    Added detection on:
    - prep_resize
    - check_can_live_migration_destination
    - pre_live_migration

    The improved method of detection now locks based on group_id
    and considers other migrations in-progress as well.

    Closes-bug: #1821755
    Change-Id: I32e6214568bb57f7613ddeba2c2c46da0320fabc

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/794328

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/795542

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by "Boxiang Zhu <zhu.boxiang@99cloud.net>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/651969

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers