Compute nodes will fight over allocations during migration

Bug #1707071 reported by Dan Smith
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Jay Pipes
Ocata
Confirmed
Medium
Unassigned

Bug Description

As far back as Ocata, compute nodes that manage allocations will end up overwriting allocations from other compute nodes when doing a migration. This stems from the fact that the Resource Tracker was designed to manage a per-compute-node set of accounting, but placement is per-instance accounting. When we try to create/update/delete allocations for instances on compute nodes from the existing resource tracker code paths, we end up deleting allocations that apply to other compute nodes in the process.

For example, when an instance A is running against compute1, there is an allocation for its resources against that node. When migrating that instance to compute2, the target compute (or scheduler) may create allocations for instance A against compute2, which overwrite those for compute1. Then, compute1's periodic healing task runs, and deletes the allocation for instance A against compute2, replacing it with one for compute1. When migration completes, compute2 heals again and overwrites the allocation with one for the new home of the instance. Then, compute1 may delete the allocation it thinks it owns, followed finally by another heal on compute2. While this is going on, the scheduler (via placement) does not have a consistent view of resources to make proper decisions.

In order to fix this, we need a combination of changes:

1. There should be allocations against both compute nodes for an instance during a migration
2. Compute nodes should respect the double claim, and not delete allocations for instances it used to own, if the allocation has no resources for its resource provider
3. Compute nodes should not delete allocations for instances unless they own the instance _and_ the instance is in DELETED/SHELVED_OFFLOADED state

Revision history for this message
Matt Riedemann (mriedem) wrote :

I pushed up a sanity check patch:

https://review.openstack.org/#/c/488187/2

for this and got a hit in the live migration job:

http://logs.openstack.org/87/488187/2/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/107a810/logs/subnode-2/screen-n-cpu.txt.gz#_Jul_27_22_21_14_766229

Jul 27 22:21:14.766229 ubuntu-xenial-2-node-rax-iad-10131407-754260 nova-compute[921]: WARNING nova.scheduler.client.report [None req-e1963d3c-9cb0-4980-a397-7314c7f483fa tempest-LiveMigrationRemoteConsolesV26Test-535663336 tempest-LiveMigrationRemoteConsolesV26Test-535663336] [instance: 6a8ff75c-22e3-4a17-b0d3-ac1b44f9f7c3] Removing allocations for instance which are currently against more than one compute node resource provider. Current allocations: {u'13b1e5e0-66ef-4533-9a07-b1a3220d6b00': {u'generation': 8, u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}, u'7aa9619d-db83-4da9-b822-f4d66e7143f8': {u'generation': 6, u'resources': {u'VCPU': 1, u'MEMORY_MB': 64}}}

I have not seen a hit on the case that the source node is deleting allocations for an instance but the source node UUID is not part of the current allocations, which is a race that can exist in a slow system, but probably not slow enough for the upstream CI system which doesn't have that much traffic.

tags: added: compute placement resource-tracker
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :

From the three points:

1. Is handled by https://review.openstack.org/#/c/487589/

2. We could work into https://review.openstack.org/#/c/488187/

3. Jay Pipes is working on this one I think.

Jay Pipes (jaypipes)
Changed in nova:
assignee: nobody → Jay Pipes (jaypipes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/488510

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/488595

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/489714

Matt Riedemann (mriedem)
tags: added: pike-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/489714
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dea7a19c361be8180c35ed09ce12434de0e9fc87
Submitter: Jenkins
Branch: master

commit dea7a19c361be8180c35ed09ce12434de0e9fc87
Author: Jay Pipes <email address hidden>
Date: Tue Aug 1 12:55:48 2017 -0400

    Additional assertions to resize tests

    Adds a number of assertions that usage and allocation records are what
    we expect at the following times:

    * After boot on the source host but before running periodics After
    * resize operation but before confirming or reverting and before
      running periodics

    Change-Id: I23ed64d5c48f520f115d64a97c748e5b49bb5f20
    Related-bug: #1707071

Changed in nova:
assignee: Jay Pipes (jaypipes) → Dan Smith (danms)
Changed in nova:
assignee: Dan Smith (danms) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Chris Dent (cdent)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Jay Pipes (<email address hidden>) on branch: master
Review: https://review.openstack.org/488595
Reason: I'll come back to this later...

Changed in nova:
assignee: Chris Dent (cdent) → Dan Smith (danms)
Changed in nova:
assignee: Dan Smith (danms) → Jay Pipes (jaypipes)
Changed in nova:
assignee: Jay Pipes (jaypipes) → Dan Smith (danms)
Changed in nova:
assignee: Dan Smith (danms) → Jay Pipes (jaypipes)
Changed in nova:
assignee: Jay Pipes (jaypipes) → Dan Smith (danms)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/491850

Changed in nova:
assignee: Dan Smith (danms) → Jay Pipes (jaypipes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/491850
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=09e169bd7a74deaf52d9d6d38ff37799cab24157
Submitter: Jenkins
Branch: master

commit 09e169bd7a74deaf52d9d6d38ff37799cab24157
Author: Jay Pipes <email address hidden>
Date: Mon Aug 7 16:36:49 2017 -0400

    placement: refactor healing of allocations in RT

    Updates the RT method for "healing" instance allocations so that we
    only delete allocations (and delete them entirely for an instance) in
    the event that the instance has been local-deleted. Adds log statements
    to the method to give us a good idea of what the state of various
    things are during test runs.

    Note that this patch only partially fixes LP #1707071 to remove some of
    the back-and-forth allocation trampling that was going on for move
    operations. The next patch adds the final piece of the puzzle which is
    to have the confirm/revert resize step on the source or destination
    compute host *correct* a doubled-up allocation. What this patch does is
    prevent the healing function from deleting allocations unless the
    instance is local-deleted.

    Partial-bug: #1707071

    Change-Id: Ia0bb32fd92ae723c505d0fce5691e0fe0540f10d

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/488510
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5390210a4fa46e2af6b6aec9b41c03147b52760c
Submitter: Jenkins
Branch: master

commit 5390210a4fa46e2af6b6aec9b41c03147b52760c
Author: Jay Pipes <email address hidden>
Date: Wed Aug 2 17:48:38 2017 -0400

    Remove provider allocs in confirm/revert resize

    Now that the scheduler creates a doubled-up allocation for the duration
    of a move operation (with part of the allocation referring to the
    source and part referring to the destination host), we need to remove
    the source provider when confirming the resize and remove the
    destination provider from the allocation when reverting a resize. This
    patch adds this logic in the RT's drop_move_claim() method.

    Change-Id: I6f8afe6680f83125da9381c812016b3623503825
    Co-Authored-By: Dan Smith <email address hidden>
    Fixes-bug: #1707071

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem)
tags: removed: pike-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/496847

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/496847
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e3b7f43e3ae9689967d2ecd1cd7974d829c78c85
Submitter: Jenkins
Branch: master

commit e3b7f43e3ae9689967d2ecd1cd7974d829c78c85
Author: Matt Riedemann <email address hidden>
Date: Wed Aug 23 14:51:17 2017 -0400

    Add missing tests for _remove_deleted_instances_allocations

    Some testing gaps were pointed out during the review for change
    Ia0bb32fd92ae723c505d0fce5691e0fe0540f10d - this change adds
    the missing tests.

    Change-Id: I0cb676ef3fdbc14e9c19fff231c129612805bad6
    Related-Bug: #1707071

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.