Using reset for hard_reboot is not reliable

Bug #1036826 reported by Rafi Khardalian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Rafi Khardalian

Bug Description

Using reset for hard_reboot is not reliable, even where it is supported by libvirt. Hard reboots are one of the only ways to recover a VM in a broken state. The reset command assumes the domain is running in some capacity and will fail if it is not. Here are some steps to reproduce:

1. Create a new libvirt VM (using qemu for my testing).

2. virsh list # validate it is running

virsh # list
 Id Name State
----------------------------------
  3 instance-00000001 running

3. Find and kill -9 the pid of the qemu/kvm process. virsh list --all to confirm:

virsh # list --all
 Id Name State
----------------------------------
  - instance-00000001 shut off

4. Issue a virsh reset, as the code would do:

virsh # reset instance-00000001
error: Failed to reset domain instance-00000001
error: Requested operation is not valid: domain is not running

There is no way to recover this VM without manual intervention. Reverting to the hold behavior, by commenting out the conditional and forcing the code below works much more reliably:

            self._destroy(instance)
            self._create_domain(xml, virt_dom)

Hard reset is the current sledgehammer for fixing issues and it really needs to stay that way.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/11371

Changed in nova:
assignee: nobody → Rafi Khardalian (rkhardalian)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/11371
Committed: http://github.com/openstack/nova/commit/2b7d5c783a330fbf5a54cd5f63dbf5f1004c6103
Submitter: Jenkins
Branch: master

commit 2b7d5c783a330fbf5a54cd5f63dbf5f1004c6103
Author: Rafi Khardalian <email address hidden>
Date: Tue Aug 14 20:41:23 2012 +0000

    Revert to prior method of executing a libvirt hard_reboot.

    Fixes bug 1036826.

    Using reset for hard_reboot is not reliable, even where it is supported
    by libvirt. Hard reboots are one of the only ways to recover a VM in a
    broken state. The reset command assumes the domain is running in some
    capacity and will fail if it is not.

    Hard reset is the current sledgehammer for fixing issues and it really
    needs to stay that way.

    Change-Id: I84705b72d79cf71adad066b18267fdfb199bc9cb

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → folsom-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.