VM don't resume after detaching volume

Bug #1240922 reported by Bellantuono Daniel
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Nikola Đipanov

Bug Description

Hi guys,

I have a suspend vm with an attached volume, if I detached volume while instance is in suspend state it can't be resumed properly.
It happens with both Windows and Linux vm's.

LibVirt error:
2494: error : qemuMonitorIORead:502 : Unable to read from monitor: Connection reset by peer

Packets versioning in Ubuntu 12.04:
ii libvirt-bin 1.0.2-0ubuntu11.13.04.2~cloud0 programs for the libvirt library
ii libvirt-dev 1.0.2-0ubuntu11.13.04.2~cloud0 development files for the libvirt library
ii libvirt0 1.0.2-0ubuntu11.13.04.2~cloud0 library for interfacing with different virtualization systems
ii python-libvirt 1.0.2-0ubuntu11.13.04.2~cloud0 libvirt Python bindings
ii kvm 1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu14.10 dummy transitional package from kvm to qemu-kvm
ii qemu-common 1.0+noroms-0ubuntu14.10 qemu common functionality (bios, documentation, etc)
ii qemu-kvm 1.0+noroms-0ubuntu14.10 Full virtualization on i386 and amd64 hardware
ii nova-common 1:2013.1.2-0ubuntu1~cloud0 OpenStack Compute - common files
ii nova-compute 1:2013.1.2-0ubuntu1~cloud0 OpenStack Compute - compute node
ii nova-compute-kvm 1:2013.1.2-0ubuntu1~cloud0 OpenStack Compute - compute node (KVM)
ii python-nova 1:2013.1.2-0ubuntu1~cloud0 OpenStack Compute Python libraries
ii python-novaclient 1:2.13.0-0ubuntu1~cloud0 client library for OpenStack Compute API

Changed in nova:
assignee: nobody → Jay Lau (jay-lau-513)
Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :

I got some logs in libvirtd as following:

2013-10-18 03:18:06.721+0000: 22179: error : qemuSetupDiskPathAllow:82 : Unable to allow access for disk path /dev/disk/by-path/ip-9.111.242.65:3260-iscsi-iqn.2010-10.org.openstack:volume-5c973e8e-9854-47d9-bb9f-df0415359217-lun-1: No such file or directory
2013-10-18 03:18:07.168+0000: 22179: error : virSecurityDACRestoreSecurityFileLabel:143 : cannot resolve symlink /dev/disk/by-path/ip-9.111.242.65:3260-iscsi-iqn.2010-10.org.openstack:volume-5c973e8e-9854-47d9-bb9f-df0415359217-lun-1: No such file or directory
2013-10-18 03:18:07.606+0000: 22179: error : qemuRemoveCgroup:562 : internal error Unable to find cgroup for instance-0000000d
2013-10-18 03:18:07.606+0000: 22179: warning : qemuProcessStop:3561 : Failed to remove cgroup for instance-0000000d

Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :
Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :

Bellantuono, do you have any strong requirement to detach volume for a suspended VM?

I want to disable detach volume for a suspend/paused VM, make sense? Thanks.

Revision history for this message
Bellantuono Daniel (kelfen) wrote :

No, I don't have any requirement to detach volume for a suspend VM.
Disable detach volume when the vm is in suspend/pause mode seems to me an excellent idea!

Thanks a lot for the support!

melanie witt (melwitt)
tags: added: volumes
Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

The solution to this problem may not be all that simple.

When you do a live migration between nodes the instance is suspended, copied, then resumed on the target host. When the instance has a volume attached, it appears that the volume is detached after the instance is suspended. I haven't dug into the code to confirm this, but that's the way it looks to me from the logs.

Revision history for this message
Nikola Đipanov (ndipanov) wrote :
Download full text (6.4 KiB)

here's the stack trace that shows exactly what the issue is:

2014-03-25 18:41:10.791 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Failed to open file '/dev/disk/by-path/ip-192.168.123.25:3260-iscsi-iqn.2010-10.org.openstack:volume-cecf407a-1c3b
-4e70-a84f-d34c93fa3f2a-lun-1': No such file or directory
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/oslo.messaging/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/oslo.messaging/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/oslo.messaging/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 88, in wrapped
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher payload)
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/openstack/common/excutils.py", line 68, in __exit__
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 71, in wrapped
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher return f(self, context, *args, **kw)
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 278, in decorated_function
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher pass
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/openstack/common/excutils.py", line 68, in __exit__
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 264, in decorated_function
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 329, in decorated_function
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher function(self, context, *args, **kwargs)
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 306, in decorated_function
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher e, sys.exc_info())
2014-03-25 18:41:10.791 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/openstack/common/excutils.py", line...

Read more...

Changed in nova:
status: New → Triaged
importance: Undecided → Medium
tags: added: icehouse-rc-potential libvirt
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Looks like I spoke too soon - it seems that the xml we define the domain with in _create_domain_and_network is the correct one, however when we call createWithFlags on such a domain - it seems to throw the above exception.

It might be libvirt caching something it shouldn't be. I'll get danpb to take a look at this one.

Revision history for this message
Daniel Berrange (berrange) wrote :

NB, you must not, as a general rule, make any changes to the guest config associated with a managed save image. Doing so creates guest visible ABI changes which will cause the guest to crash & burn when restoring. The only changes it is safe to make are those which leave the guest visible ABI unchanged

Revision history for this message
Nikola Đipanov (ndipanov) wrote :
Changed in nova:
milestone: none → icehouse-rc1
assignee: Jay Lau (jay-lau-513) → Nikola Đipanov (ndipanov)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/83505
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d0948a1fb0a4c425310f0cf0aea5b28680dc4817
Submitter: Jenkins
Branch: master

commit d0948a1fb0a4c425310f0cf0aea5b28680dc4817
Author: Nikola Dipanov <email address hidden>
Date: Thu Mar 27 18:01:22 2014 +0100

    Disable volume attach/detach for suspended instances

    As described in the bug - some hypervisors (libvirt) do not support
    this. It is best to disable it in the API to provide a consistent user
    experience.

    Also adds a test to prevent an accidental regression.

    Change-Id: I5b404cca22cffecbaf524e2511810e5341242052
    Closes-bug: #1240922

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-rc1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.