After "nova reboot" the instance looses the nova-volume on its xml

Bug #1012717 reported by Mercadolibre CloudBuilders
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
Essex
Fix Released
High
Rafi Khardalian

Bug Description

Hi, we are using Essex final release with nova volumes.
On an instance with nova-volumes is rebooted from inside the guest, it comes back with the nova-volume successfull.
But if we issue a "nova reboot [id]" the instance looses the nova volume, and the device block definition dissapears in the /var/run/libvirt/qemu/[instance_id].xml

We think its a very critical bug.
If you need any kind of extra information please do.

Cheers.

Revision history for this message
Mercadolibre CloudBuilders (cloudbuilders-n) wrote :

We think the problem is on "_soft_reboot" over /usr/share/pyshared/nova/virt/libvirt/connection.py.

Revision history for this message
Mercadolibre CloudBuilders (cloudbuilders-n) wrote :

Its confirmed, the soft_reboot is what is causing the instance to loose its nova volume, hard_reboot works ok.

Revision history for this message
Ryan Finnie (fo0bar) wrote :

Are you using Xen for backing?

We're using KVM, and I cannot reproduce this. The volume comes back:
 * in-OS reboot
 * euca-reboot-instance (which I believe is doing a hard reboot)
 * nova reboot
 * nova-reboot --hard

I can however confirm that the XML file does not get updated.

Revision history for this message
Mercadolibre CloudBuilders (cloudbuilders-n) wrote :

Exactly, we are using KVM also.
And when you issue a detach, the XML on /var/run/libvirt get updated, but the file on /etc/libvirt/qemu doesnt, so, when you reboot/hard reboot an instance, it gets rebuilt from the /etc/libvirt/qemu so it fails because it doesnt find the volume because of course, its detached (iscsiadm).

Waiting for some feedback.

Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 1012717] After "nova reboot" the instance looses the nova-volume on its xml

The code was recently changed to keep the vm definitiion around in libvirt on libvirt instead of rebuilding it. Do you still see this issue with current trunk?

Revision history for this message
Mercadolibre CloudBuilders (cloudbuilders-n) wrote :

Vish.
I can confirm that not using the ubuntu 12.04 packages but installing nova cloning :

+ git clone https://github.com/openstack/nova.git --branch stable/essex

The problem is fixed, so, who can i talk to for the ubuntu packages to be upgraded regarding this ginds of bugs ?
Or is advisable to use github in production deployment ?

Revision history for this message
Thierry Carrez (ttx) wrote :

Already fixed in Folsom, nominating for Essex backport.

Vish: how invasive is the fix for this ? Would it make a likely stable/essex backport target ?

Changed in nova:
status: New → Invalid
Revision history for this message
Mercadolibre CloudBuilders (cloudbuilders-n) wrote :

Weird that no one reported this before, seems very critical.

Revision history for this message
Vish Ishaya (vishvananda) wrote :

The fix is a bit invasive. Backporting could be a little tough.

Revision history for this message
Dave Spano (dspano) wrote :

I'd like to ask for it to be backported as well please.

Revision history for this message
Han-sebastien (han-sebastien) wrote :

+1 for the backporting please.

Revision history for this message
Rafi Khardalian (rkhardalian) wrote :

I thought this was already fixed in stable/essex. If it's still happening, please let me know how to reproduce it and I'll commit to submitting a fix.

Revision history for this message
Rafi Khardalian (rkhardalian) wrote :

Disregard my prior comment about it being fixed in stable, it's not. I'll submit a patch now.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/essex)

Fix proposed to branch: stable/essex
Review: https://review.openstack.org/12661

Revision history for this message
Rafi Khardalian (rkhardalian) wrote :

One note to keep in mind -- The patch will only update the libvirt XML as volumes are attached and detached from the point AFTER which the patch is applied. In other words, any volumes which are connected today using unmodified stable/essex code will need to be detached and re-attached in order to persist on subsequent hard reboots.

That said, I do have a much more substantial patch against Essex, which would mitigate the need for the aforementioned process. It does not qualify for submission under the current stable release policies but I'd be happy to provide it off list for anyone interested.

tags: added: essex-backport
Revision history for this message
Dave Spano (dspano) wrote :

Thank you Rafi!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/essex)

Reviewed: https://review.openstack.org/12661
Committed: http://github.com/openstack/nova/commit/b375b4f1131d54315bb9952fcf2eff363b3b29b1
Submitter: Jenkins
Branch: stable/essex

commit b375b4f1131d54315bb9952fcf2eff363b3b29b1
Author: Dan Smith <email address hidden>
Date: Fri Jun 29 09:35:02 2012 -0700

    Redefine the domain's XML on volume attach/detach

    This fixes bug 1004791 by adding new disk definitions to the defined
    XML instead of just modifying the running instance.

    Cherry picked for stable/essex to fix bug 1012717.

    Change-Id: I6596dae7c54158c32bc7b399c55a1797b2d98242

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers