Quota is decremented during instance delete in cell0 even if the instance destroy fails

Bug #1678326 reported by Matt Riedemann on 2017-03-31
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Matt Riedemann
Ocata
High
Matt Riedemann

Bug Description

This is a follow on from bug 1670627. As pointed out in comments 23 and 25, in the local delete case where there is no host and the instance is in a cell, we're decrementing quota even if the instance.destroy() operation fails.

We commit the usage decrement here:

https://github.com/openstack/nova/blob/88bc8dc5ce32748452c9d3acda9f35e77fedb6ce/nova/compute/api.py#L1878

Attempt to destroy the instance here:

https://github.com/openstack/nova/blob/88bc8dc5ce32748452c9d3acda9f35e77fedb6ce/nova/compute/api.py#L1887

And if the instance.destroy() fails, we rollback the usage decrement here:

https://github.com/openstack/nova/blob/88bc8dc5ce32748452c9d3acda9f35e77fedb6ce/nova/compute/api.py#L1891

That rollback has no effect because once we commit a reservation, it's wiped out in the quotas object:

https://github.com/openstack/nova/blob/88bc8dc5ce32748452c9d3acda9f35e77fedb6ce/nova/objects/quotas.py#L105

Attempting to rollback reservations on a quotas object that has already committed reservations is a noop:

https://github.com/openstack/nova/blob/88bc8dc5ce32748452c9d3acda9f35e77fedb6ce/nova/objects/quotas.py#L111

--

Unlike the 'normal' (pre-cellsv2) local delete case which does the commit after the instance is destroyed:

https://github.com/openstack/nova/blob/88bc8dc5ce32748452c9d3acda9f35e77fedb6ce/nova/compute/api.py#L2023

And we rollback (but not commit) if instance.destroy() fails because the instance is already deleted:

https://github.com/openstack/nova/blob/88bc8dc5ce32748452c9d3acda9f35e77fedb6ce/nova/compute/api.py#L2028

--

The code in _delete() is wrong because it was copied from the code in _delete_while_booting() which is also wrong:

https://github.com/openstack/nova/blob/88bc8dc5ce32748452c9d3acda9f35e77fedb6ce/nova/compute/api.py#L1784

So we have to fix both places.

I suggest that we also change the noop behavior on the Quotas object such that if we attempt a reservation commit or rollback operation on a Quotas object that does not have any reservations, it is considered a programming error rather than a noop, because that's how this bug was introduced in the first place.

Matt Riedemann (mriedem) on 2017-03-31
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)

Fix proposed to branch: master
Review: https://review.openstack.org/452351

Changed in nova:
status: Triaged → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/452359

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/452364

Reviewed: https://review.openstack.org/452351
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=032937ce51c64e0f5292ca3149e3b244863fffca
Submitter: Jenkins
Branch: master

commit 032937ce51c64e0f5292ca3149e3b244863fffca
Author: Matt Riedemann <email address hidden>
Date: Fri Mar 31 19:56:14 2017 -0400

    Add regression test for quota decrement bug 1678326

    This was spotted from someone validating the fix for
    bug 1670627. They reported that even though they failed
    to delete an instance in ERROR state that was in cell0,
    the quota usage was decremented.

    This is because we committed the quota reservation
    to decrement the usage before actually attempting to destroy
    the instance, rather than upon successful deletion.

    The rollback after InstanceNotFound is a noop because of
    how the Quotas.rollback method noops if the reservations
    were already committed. That is in itself arguably a bug,
    but not fixed here, especially since the counting quotas
    work in Pike will remove all of the reservations commit and
    rollback code.

    Change-Id: I12d1fa1a10f9014863123ac9cc3c63ddfb48378e
    Partial-Bug: #1678326

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/452359
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=42d4ea0b6289b1f26a44a95dcfadfc75587d7c2c
Submitter: Jenkins
Branch: master

commit 42d4ea0b6289b1f26a44a95dcfadfc75587d7c2c
Author: Matt Riedemann <email address hidden>
Date: Fri Mar 31 21:08:55 2017 -0400

    Commit usage decrement after destroying instance

    This fixes a regression in Ocata where we were always
    decrementing quota usage during instance delete even
    if we failed to delete the instance. Now the reservation
    is properly committed after the instance is destroyed.

    The related functional test is updated to show this working
    correctly now.

    Change-Id: I5200e72c195e12b5a069cbae3f209492ed569fb4
    Closes-Bug: #1678326

Reviewed: https://review.openstack.org/453859
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9245bbf79ddbfd8a2e2310af654711f9d3a547b1
Submitter: Jenkins
Branch: master

commit 9245bbf79ddbfd8a2e2310af654711f9d3a547b1
Author: Matt Riedemann <email address hidden>
Date: Wed Apr 5 16:27:41 2017 -0400

    Perform old-style local delete for shelved offloaded instances

    This fixes a regression from some local delete code added for cells v2
    where it assumed that if an instance did not have a host, it wasn't
    scheduled to a cell yet. That assumption misses the fact that the
    instance won't have a host if it was shelved offloaded. And to be
    shelved offloaded, the instance had to have first been built on a host
    in a cell.

    So we simply duplicate the same check as later in the _delete() method
    for instance.host or shelved-offloaded to decide what the case is.

    Obviously this is all a giant mess of duplicate delete path code that
    needs to be unwound, and that's the plan, but first we're fixing
    regressions and then we can start rolling this duplication all back
    so we can get back to the single local delete flow that we know and love.

    Change-Id: Ie2063f621618c1d90aeb59f0f1d7da351862ea9f
    Closes-Bug: #1678326

Reviewed: https://review.openstack.org/452363
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=01fa555b36055c1042e342d9f332f38e106d5d0d
Submitter: Jenkins
Branch: stable/ocata

commit 01fa555b36055c1042e342d9f332f38e106d5d0d
Author: Matt Riedemann <email address hidden>
Date: Fri Mar 31 19:56:14 2017 -0400

    Add regression test for quota decrement bug 1678326

    This was spotted from someone validating the fix for
    bug 1670627. They reported that even though they failed
    to delete an instance in ERROR state that was in cell0,
    the quota usage was decremented.

    This is because we committed the quota reservation
    to decrement the usage before actually attempting to destroy
    the instance, rather than upon successful deletion.

    The rollback after InstanceNotFound is a noop because of
    how the Quotas.rollback method noops if the reservations
    were already committed. That is in itself arguably a bug,
    but not fixed here, especially since the counting quotas
    work in Pike will remove all of the reservations commit and
    rollback code.

    Change-Id: I12d1fa1a10f9014863123ac9cc3c63ddfb48378e
    Partial-Bug: #1678326
    (cherry picked from commit 032937ce51c64e0f5292ca3149e3b244863fffca)

tags: added: in-stable-ocata

Reviewed: https://review.openstack.org/452364
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=361b88383130be8da9d474d02a9f62136239d506
Submitter: Jenkins
Branch: stable/ocata

commit 361b88383130be8da9d474d02a9f62136239d506
Author: Matt Riedemann <email address hidden>
Date: Fri Mar 31 21:08:55 2017 -0400

    Commit usage decrement after destroying instance

    This fixes a regression in Ocata where we were always
    decrementing quota usage during instance delete even
    if we failed to delete the instance. Now the reservation
    is properly committed after the instance is destroyed.

    The related functional test is updated to show this working
    correctly now.

    Conflicts:
          nova/compute/api.py
          nova/tests/unit/compute/test_compute_api.py

    NOTE(mriedem): The conflict is due to not having change
    edf51119fa59ff8a3337abb9107a06fa33d3c68f in stable/ocata.

    Change-Id: I5200e72c195e12b5a069cbae3f209492ed569fb4
    Closes-Bug: #1678326
    (cherry picked from commit 5c1af55cbe526dea72308767df8709064ffae6a8)

Reviewed: https://review.openstack.org/454414
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6384e935be147d3f454c264703136c5e82eebafd
Submitter: Jenkins
Branch: stable/ocata

commit 6384e935be147d3f454c264703136c5e82eebafd
Author: Matt Riedemann <email address hidden>
Date: Wed Apr 5 16:27:41 2017 -0400

    Perform old-style local delete for shelved offloaded instances

    This fixes a regression from some local delete code added for cells v2
    where it assumed that if an instance did not have a host, it wasn't
    scheduled to a cell yet. That assumption misses the fact that the
    instance won't have a host if it was shelved offloaded. And to be
    shelved offloaded, the instance had to have first been built on a host
    in a cell.

    So we simply duplicate the same check as later in the _delete() method
    for instance.host or shelved-offloaded to decide what the case is.

    Obviously this is all a giant mess of duplicate delete path code that
    needs to be unwound, and that's the plan, but first we're fixing
    regressions and then we can start rolling this duplication all back
    so we can get back to the single local delete flow that we know and love.

    Change-Id: Ie2063f621618c1d90aeb59f0f1d7da351862ea9f
    Closes-Bug: #1678326
    (cherry picked from commit 9245bbf79ddbfd8a2e2310af654711f9d3a547b1)

Matt Riedemann (mriedem) wrote :

For anyone confused, Ie2063f621618c1d90aeb59f0f1d7da351862ea9f was targeting the wrong bug, it doesn't fix this one, it fixes bug 1675570.

This issue was fixed in the openstack/nova 15.0.3 release.

This issue was fixed in the openstack/nova 16.0.0.0b1 development milestone.

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/452346

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers