Allocations are not removed from destination node when rescheduling during resize/migrate

Bug #1712850 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Pike
Fix Committed
High
Matt Riedemann

Bug Description

This is similar to bug 1712718 but instead of the case that we're creating a new instance (or unshelving an offloaded instance), this is the case that a resize/cold migration fails it's pre-check or claim on the destination host and gets rescheduled:

https://github.com/openstack/nova/blob/16.0.0.0rc1/nova/compute/manager.py#L3801

That is called from prep_resize which happens on the destination node, so we should call SchedulerReportClient.remove_provider_from_instance_allocation in that case to remove the allocations against the destination node before rescheduling.

Note that we can't just remove all allocations since that would remove the allocations that already exist on the source node for the instance during a resize/migrate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/497541

Matt Riedemann (mriedem)
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/497592

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/497605

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/497606

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/497541
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=78850296a54ff2e1ec298c013fa91fc6eace962b
Submitter: Jenkins
Branch: master

commit 78850296a54ff2e1ec298c013fa91fc6eace962b
Author: Matt Riedemann <email address hidden>
Date: Thu Aug 24 14:22:20 2017 -0400

    Add functional test for rescheduling during a migration

    This adds a functional test which recreates the bug where
    the allocations on a destination node are not removed from
    Placement before rescheduling during a cold migrate operation,
    which is the same code flow as a resize even though resize
    isn't explicitly tested here.

    Change-Id: I1e3def1e98d0008240837eb1ad0eaa81a9b2d189
    Related-Bug: #1712850

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/497605
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ec8b92eceb7ecdce78037c2959cc106fd1e8f1f5
Submitter: Jenkins
Branch: stable/pike

commit ec8b92eceb7ecdce78037c2959cc106fd1e8f1f5
Author: Matt Riedemann <email address hidden>
Date: Thu Aug 24 14:22:20 2017 -0400

    Add functional test for rescheduling during a migration

    This adds a functional test which recreates the bug where
    the allocations on a destination node are not removed from
    Placement before rescheduling during a cold migrate operation,
    which is the same code flow as a resize even though resize
    isn't explicitly tested here.

    Change-Id: I1e3def1e98d0008240837eb1ad0eaa81a9b2d189
    Related-Bug: #1712850
    (cherry picked from commit 78850296a54ff2e1ec298c013fa91fc6eace962b)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/497592
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b53133ba50457f1a4983434800ebef710da67999
Submitter: Jenkins
Branch: master

commit b53133ba50457f1a4983434800ebef710da67999
Author: Matt Riedemann <email address hidden>
Date: Thu Aug 24 15:32:01 2017 -0400

    Cleanup allocations in failed prep_resize

    During a resize/migration, the scheduler 'doubles' the
    allocations on both the source and destination hosts, which
    could be the same host if resizing to the same host.

    If prep_resize fails, the destination node allocations were
    not getting cleaned up before rescheduling to another host.
    If it's a resize to the same host, the doubled allocation
    from the scheduler wasn't being subtracted for the single host.

    This change cleans up the allocations from the current node
    when prep_resize fails. If it's not a resize to the same host,
    we're on the destination node already. If it is a resize to
    the same host, remove_provider_from_instance_allocation in the
    SchedulerReportClient accounts for subtracting the new flavor
    from the doubled allocation.

    Change-Id: I8e81704518cef8847dc65b70a75cbd5e67f1cd39
    Closes-Bug: #1712850

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/497606
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1dafea91bf6070c414c54ec0c26ebb4b6cb096c0
Submitter: Jenkins
Branch: stable/pike

commit 1dafea91bf6070c414c54ec0c26ebb4b6cb096c0
Author: Matt Riedemann <email address hidden>
Date: Thu Aug 24 15:32:01 2017 -0400

    Cleanup allocations in failed prep_resize

    During a resize/migration, the scheduler 'doubles' the
    allocations on both the source and destination hosts, which
    could be the same host if resizing to the same host.

    If prep_resize fails, the destination node allocations were
    not getting cleaned up before rescheduling to another host.
    If it's a resize to the same host, the doubled allocation
    from the scheduler wasn't being subtracted for the single host.

    This change cleans up the allocations from the current node
    when prep_resize fails. If it's not a resize to the same host,
    we're on the destination node already. If it is a resize to
    the same host, remove_provider_from_instance_allocation in the
    SchedulerReportClient accounts for subtracting the new flavor
    from the doubled allocation.

    Change-Id: I8e81704518cef8847dc65b70a75cbd5e67f1cd39
    Closes-Bug: #1712850
    (cherry picked from commit b53133ba50457f1a4983434800ebef710da67999)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0rc2

This issue was fixed in the openstack/nova 16.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.