OpenStack Compute (nova)

Using reset for hard_reboot is not reliable

Bug #1036826 reported by Rafi Khardalian on 2012-08-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Undecided	Rafi Khardalian	OpenStack Compute (nova) 2012.2 "folsom"

Bug Description

Using reset for hard_reboot is not reliable, even where it is supported by libvirt. Hard reboots are one of the only ways to recover a VM in a broken state. The reset command assumes the domain is running in some capacity and will fail if it is not. Here are some steps to reproduce:

1. Create a new libvirt VM (using qemu for my testing).

2. virsh list # validate it is running

virsh # list
Id Name State
----------------------------------
3 instance-00000001 running

3. Find and kill -9 the pid of the qemu/kvm process. virsh list --all to confirm:

virsh # list --all
Id Name State
----------------------------------
- instance-00000001 shut off

4. Issue a virsh reset, as the code would do:

virsh # reset instance-00000001
error: Failed to reset domain instance-00000001
error: Requested operation is not valid: domain is not running

There is no way to recover this VM without manual intervention. Reverting to the hold behavior, by commenting out the conditional and forcing the code below works much more reliably:

self._destroy(instance)
self._create_domain(xml, virt_dom)

Hard reset is the current sledgehammer for fixing issues and it really needs to stay that way.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-14: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/11371

Changed in nova:
assignee:	nobody → Rafi Khardalian (rkhardalian)
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-17: Fix merged to nova (master)

Reviewed: https://review.openstack.org/11371
Committed: http://github.com/openstack/nova/commit/2b7d5c783a330fbf5a54cd5f63dbf5f1004c6103
Submitter: Jenkins
Branch: master

commit 2b7d5c783a330fbf5a54cd5f63dbf5f1004c6103
Author: Rafi Khardalian <email address hidden>
Date: Tue Aug 14 20:41:23 2012 +0000

Revert to prior method of executing a libvirt hard_reboot.

Fixes bug 1036826.

    Using reset for hard_reboot is not reliable, even where it is supported
    by libvirt. Hard reboots are one of the only ways to recover a VM in a
    broken state. The reset command assumes the domain is running in some
    capacity and will fail if it is not.

Hard reset is the current sledgehammer for fixing issues and it really
needs to stay that way.

Change-Id: I84705b72d79cf71adad066b18267fdfb199bc9cb

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2012-09-19

Changed in nova:
milestone:	none → folsom-rc1
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2012-09-27

Changed in nova:
milestone:	folsom-rc1 → 2012.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.