Failed unshelve does not remove allocations from destination node

Bug #1713796 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Pike
Fix Committed
High
Matt Riedemann

Bug Description

During an unshelve from an offloaded instance, conductor will call the scheduler to pick a host. The scheduler will make allocations against the chosen node as part of that select_destinations() call. Then conductor casts to that compute host to unshelve the instance.

If the spawn on the hypervisor fails while we've made the instance claim:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4485

Or even if the claim test fails, the allocations on the destination node aren't removed in Placement.

The RT aborts the claim here:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L414

That calls _update_usage_from_instance but doesn't change the has_ocata_computes kwarg so we get here:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L1041

And we don't cleanup the allocations for the instance.

The other case is if the claim fails, the instance_claim method will raise ComputeResourcesUnavailable which would be handled here:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/claims.py#L161

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4491

But we don't remove allocations or do any other cleanup there.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/506414

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/506458

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/506414
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=56232e5de9cd2b74a2879e2e9c09099c0de2609e
Submitter: Jenkins
Branch: master

commit 56232e5de9cd2b74a2879e2e9c09099c0de2609e
Author: Matt Riedemann <email address hidden>
Date: Thu Sep 21 19:11:31 2017 -0400

    Add recreate test for unshelve offloaded instance spawn fail

    This adds a functional test to recreate bug 1713796 where
    allocations are not cleaned up from the compute node when
    unshelving an offloaded server fails when spawning the
    guest.

    Change-Id: I3237ec954f6504513c8ef5a6ba43f57d0d2622a3
    Related-Bug: #1713796

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/507196

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/507197

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/506458
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f18202185d05e3f7e89fca6bbc17daf3c5dc4b98
Submitter: Jenkins
Branch: master

commit f18202185d05e3f7e89fca6bbc17daf3c5dc4b98
Author: Matt Riedemann <email address hidden>
Date: Thu Sep 21 22:25:53 2017 -0400

    Remove allocations when unshelve fails on host

    When we unshelve an offloaded instance, the scheduler
    creates allocations in placement when picking a host.

    If the unshelve fails on the host, due to either the
    instance claim failing or the guest spawn failing, we
    need to remove the allocations since the instance isn't
    actually running on that host.

    Change-Id: Id2c7b7b3b4abda8a3b878fdee6806bcfe096e12e
    Closes-Bug: #1713796

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/507196
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=819aea74f3952ff39580396cf0dbfe98e2ee259d
Submitter: Jenkins
Branch: stable/pike

commit 819aea74f3952ff39580396cf0dbfe98e2ee259d
Author: Matt Riedemann <email address hidden>
Date: Thu Sep 21 19:11:31 2017 -0400

    Add recreate test for unshelve offloaded instance spawn fail

    This adds a functional test to recreate bug 1713796 where
    allocations are not cleaned up from the compute node when
    unshelving an offloaded server fails when spawning the
    guest.

    Change-Id: I3237ec954f6504513c8ef5a6ba43f57d0d2622a3
    Related-Bug: #1713796
    (cherry picked from commit 56232e5de9cd2b74a2879e2e9c09099c0de2609e)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/507197
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a2e0bfdea8b36755e01e210f11f206f436e8d8ec
Submitter: Jenkins
Branch: stable/pike

commit a2e0bfdea8b36755e01e210f11f206f436e8d8ec
Author: Matt Riedemann <email address hidden>
Date: Thu Sep 21 22:25:53 2017 -0400

    Remove allocations when unshelve fails on host

    When we unshelve an offloaded instance, the scheduler
    creates allocations in placement when picking a host.

    If the unshelve fails on the host, due to either the
    instance claim failing or the guest spawn failing, we
    need to remove the allocations since the instance isn't
    actually running on that host.

    Change-Id: Id2c7b7b3b4abda8a3b878fdee6806bcfe096e12e
    Closes-Bug: #1713796
    (cherry picked from commit f18202185d05e3f7e89fca6bbc17daf3c5dc4b98)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.2

This issue was fixed in the openstack/nova 16.0.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.