instance destroy causes libvirtError: Unable to read from monitor

Bug #1064581 reported by Dan Prince
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
wangpan

Bug Description

I'm seeing the following stack trace occur frequently in Nova compute.log when running test suites. The exception seems to occur when instances are terminated. I've seen it both on SmokeStack and again today it happened to me when running the latest Grizzly trunk code manually to investigate the issue further (Git Hash 36b8525).

2012-10-09 13:24:56 AUDIT nova.compute.manager [req-4909f79a-f35a-4590-88fc-5a94ad6fc9fc aff1b54efbbf4a56b88d6d92790190d0 df64b634751947d5b0fbd813f813d5e7] [instance: 5420df8c-b1ad-4373-bc6f-c2088325ad89] Terminating instance
2012-10-09 13:25:00 ERROR nova.manager [-] Error during ComputeManager.update_available_resource: Unable to read from monitor: Connection reset by peer
2012-10-09 13:25:00 TRACE nova.manager Traceback (most recent call last):
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/manager.py", line 175, in periodic_tasks
2012-10-09 13:25:00 TRACE nova.manager task(self, context)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2804, in update_available_resource
2012-10-09 13:25:00 TRACE nova.manager self.resource_tracker.update_available_resource(context)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/utils.py", line 760, in inner
2012-10-09 13:25:00 TRACE nova.manager retval = f(*args, **kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 390, in update_available_resource
2012-10-09 13:25:00 TRACE nova.manager resources = self.driver.get_available_resource()
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2220, in get_available_resource
2012-10-09 13:25:00 TRACE nova.manager 'disk_available_least': self.get_disk_available_least()}
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2712, in get_disk_available_least
2012-10-09 13:25:00 TRACE nova.manager self.get_instance_disk_info(i_name))
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2659, in get_instance_disk_info
2012-10-09 13:25:00 TRACE nova.manager xml = virt_dom.XMLDesc(0)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 187, in doit
2012-10-09 13:25:00 TRACE nova.manager result = proxy_call(self._autowrap, f, *args, **kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 147, in proxy_call
2012-10-09 13:25:00 TRACE nova.manager rv = execute(f,*args,**kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 76, in tworker
2012-10-09 13:25:00 TRACE nova.manager rv = meth(*args,**kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib64/python2.7/site-packages/libvirt.py", line 381, in XMLDesc
2012-10-09 13:25:00 TRACE nova.manager if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
2012-10-09 13:25:00 TRACE nova.manager libvirtError: Unable to read from monitor: Connection reset by peer
2012-10-09 13:25:00 TRACE nova.manager
2012-10-09 13:25:00 INFO nova.virt.libvirt.driver [-] [instance: 5420df8c-b1ad-4373-bc6f-c2088325ad89] Instance destroyed successfully.

Dan Prince (dan-prince)
Changed in nova:
importance: Undecided → High
Revision history for this message
Daniel Berrange (berrange) wrote :

> 2012-10-09 13:25:00 TRACE nova.manager libvirtError: Unable to read from monitor: Connection reset by peer

This indicates that QEMU shutdown while libvirt was talking to its monitor, to fullfill the virDomainGetXMLDesc() API call.

This obviously happened because another thread was destroying that instance while the stats collection thread was working.

I guess the stats collection thread needs to deal with the possibility that a VM can disappear while its talking to it instead of throwing the exception

Changed in nova:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/19287

Changed in nova:
assignee: nobody → wangpan (hzwangpan)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/19287
Committed: http://github.com/openstack/nova/commit/67376d34ed6bf68b15ac87ef444887cbc27dc6b0
Submitter: Jenkins
Branch: master

commit 67376d34ed6bf68b15ac87ef444887cbc27dc6b0
Author: Wangpan <email address hidden>
Date: Wed Jan 9 19:33:43 2013 +0800

    Map libvirt error to InstanceNotFound in get_instance_disk_info

    When getting instance disk info, the instance may be destroyed/deleted, and a
    libvirtError will be raised in XMLDesc method, so catching and mapping it to
    InstanceNotFound and the caller can handle it correctly.

    Fixes bug #1064581

    Change-Id: I07fed3e82e10dad4cb84ae5c8650ada351c24e78

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-3 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.