After "nova reboot" the instance looses the nova-volume on its xml

Bug #1012717 reported by Mercadolibre CloudBuilders on 2012-06-13
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned
Essex
High
Rafi Khardalian

Bug Description

Hi, we are using Essex final release with nova volumes.
On an instance with nova-volumes is rebooted from inside the guest, it comes back with the nova-volume successfull.
But if we issue a "nova reboot [id]" the instance looses the nova volume, and the device block definition dissapears in the /var/run/libvirt/qemu/[instance_id].xml

We think its a very critical bug.
If you need any kind of extra information please do.

Cheers.

We think the problem is on "_soft_reboot" over /usr/share/pyshared/nova/virt/libvirt/connection.py.

Its confirmed, the soft_reboot is what is causing the instance to loose its nova volume, hard_reboot works ok.

Ryan Finnie (fo0bar) wrote :

Are you using Xen for backing?

We're using KVM, and I cannot reproduce this. The volume comes back:
 * in-OS reboot
 * euca-reboot-instance (which I believe is doing a hard reboot)
 * nova reboot
 * nova-reboot --hard

I can however confirm that the XML file does not get updated.

Exactly, we are using KVM also.
And when you issue a detach, the XML on /var/run/libvirt get updated, but the file on /etc/libvirt/qemu doesnt, so, when you reboot/hard reboot an instance, it gets rebuilt from the /etc/libvirt/qemu so it fails because it doesnt find the volume because of course, its detached (iscsiadm).

Waiting for some feedback.

The code was recently changed to keep the vm definitiion around in libvirt on libvirt instead of rebuilding it. Do you still see this issue with current trunk?

Vish.
I can confirm that not using the ubuntu 12.04 packages but installing nova cloning :

+ git clone https://github.com/openstack/nova.git --branch stable/essex

The problem is fixed, so, who can i talk to for the ubuntu packages to be upgraded regarding this ginds of bugs ?
Or is advisable to use github in production deployment ?

Thierry Carrez (ttx) wrote :

Already fixed in Folsom, nominating for Essex backport.

Vish: how invasive is the fix for this ? Would it make a likely stable/essex backport target ?

Changed in nova:
status: New → Invalid

Weird that no one reported this before, seems very critical.

Vish Ishaya (vishvananda) wrote :

The fix is a bit invasive. Backporting could be a little tough.

Dave Spano (dspano) wrote :

I'd like to ask for it to be backported as well please.

Han-sebastien (han-sebastien) wrote :

+1 for the backporting please.

Rafi Khardalian (rkhardalian) wrote :

I thought this was already fixed in stable/essex. If it's still happening, please let me know how to reproduce it and I'll commit to submitting a fix.

Rafi Khardalian (rkhardalian) wrote :

Disregard my prior comment about it being fixed in stable, it's not. I'll submit a patch now.

Rafi Khardalian (rkhardalian) wrote :

One note to keep in mind -- The patch will only update the libvirt XML as volumes are attached and detached from the point AFTER which the patch is applied. In other words, any volumes which are connected today using unmodified stable/essex code will need to be detached and re-attached in order to persist on subsequent hard reboots.

That said, I do have a much more substantial patch against Essex, which would mitigate the need for the aforementioned process. It does not qualify for submission under the current stable release policies but I'd be happy to provide it off list for anyone interested.

tags: added: essex-backport
Dave Spano (dspano) wrote :

Thank you Rafi!

Reviewed: https://review.openstack.org/12661
Committed: http://github.com/openstack/nova/commit/b375b4f1131d54315bb9952fcf2eff363b3b29b1
Submitter: Jenkins
Branch: stable/essex

commit b375b4f1131d54315bb9952fcf2eff363b3b29b1
Author: Dan Smith <email address hidden>
Date: Fri Jun 29 09:35:02 2012 -0700

    Redefine the domain's XML on volume attach/detach

    This fixes bug 1004791 by adding new disk definitions to the defined
    XML instead of just modifying the running instance.

    Cherry picked for stable/essex to fix bug 1012717.

    Change-Id: I6596dae7c54158c32bc7b399c55a1797b2d98242

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers