Instance can not be deleted after soft reboot

Bug #1111213 reported by wangpan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
wangpan
Folsom
Won't Fix
High
Vish Ishaya

Bug Description

Reproduce steps in devstack:
1. create an instance(a instance doesn't support acpi is better)
2. soft reboot it
3. wait a minute and delete it
this is a race condition issue, so it is probabilistic to reproduce it, but you can add a time.sleep(10) to nova/virt/libvirt/driver.py:_destroy(), just like this:
        LOG.info(_("Instance destroyed successfully."),
                         instance=instance)
                raise utils.LoopingCallDone()

        LOG.debug(_("-----------------------sleep 10 start-------------------------"))
        time.sleep(10)
        timer = utils.FixedIntervalLoopingCall(_wait_for_destroy)
then the instance can not be deleted even when we delete it several times.

The reason may be that:
1. soft reboot will wait for instance become to 'shutdown', and then start it
2. delete operation also wait for this, and then clean up the instance
3. if soft reboot found the instance become to 'shutdown' firstly, it will start it immediately
4. then the delete operation will go to the _wait_for_destroy loop, and the loop may be endless
5. when we delete the instance again, because the lock was hold by the delete operation before, this one will wait the lock and don't implement actually.

Changed in nova:
assignee: nobody → Matthew Sherborne (msherborne+openstack)
status: New → In Progress
Revision history for this message
wangpan (hzwangpan) wrote :

Hi Matthew, I have already post a commit to fix this bug,
https://review.openstack.org/#/c/20883/2

Changed in nova:
assignee: Matthew Sherborne (msherborne+openstack) → wangpan (hzwangpan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/22130

Sean Dague (sdague)
Changed in nova:
importance: Undecided → High
milestone: none → grizzly-rc1
tags: added: folsom-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/22130
Committed: http://github.com/openstack/nova/commit/27330ac85c4353d9124b8788c727e1ce40f55ea8
Submitter: Jenkins
Branch: master

commit 27330ac85c4353d9124b8788c727e1ce40f55ea8
Author: Wangpan <email address hidden>
Date: Sun Feb 17 09:57:23 2013 +0800

    Fix instance can not be deleted after soft reboot

    The reason is that:
    1. soft reboot will wait for instance become to 'shutdown', and then start it
    2. delete operation also wait for this, and then clean up the instance
    3. if soft reboot found the instance become to 'shutdown' firstly, it will
    start it immediately
    4. then the delete operation will go to the _wait_for_destroy loop, and the
    loop may be endless
    5. when we delete the instance again, because the lock was hold by the delete
    operation before, this one will wait the lock and don't implement actually.
    So the domain id is checked during _wait_for_destroy loop, if it changed and
    the instance is still running, we need to destroy it again.
    If the domain is booted after _wait_for_destroy, it may result in
    unfilter_instance failed because the nwfilter is in use, so doing the same
    thing as above.

    Fixes Bug #1111213

    Change-Id: I98dc902dd86fa828f5821465c611953e08f9f637

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/23053

tags: removed: folsom-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/23069
Committed: http://github.com/openstack/nova/commit/5eed26fec4185dfcd8e9c1877a5a47068b0f3cc6
Submitter: Jenkins
Branch: master

commit 5eed26fec4185dfcd8e9c1877a5a47068b0f3cc6
Author: Wangpan <email address hidden>
Date: Wed Feb 27 14:41:41 2013 +0800

    Catching InstanceNotFound exception during reboot instance

    If the instance is deleted during reboot(most of soft reboot), an
    InstanceNotFound exception may be raised when update instance info to DB,
    and the instance may become running deleted, so catching the exception and
    logging it.
    This commit is a supplement of bug #1111213, which may result in the instance
    becomes running deleted, when deleting an instance after soft reboot.

    Change-Id: I3e8df109d431040c64e87f16ca84ff5b62dde898

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-rc1 → 2013.1
Revision history for this message
Alan Pevec (apevec) wrote :

stable/folsom review was abandoned

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.