OpenStack Compute (nova)

instance destroy causes libvirtError: Unable to read from monitor

Bug #1064581 reported by Dan Prince on 2012-10-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	wangpan	OpenStack Compute (nova) 2013.1 "grizzly"

Bug Description

I'm seeing the following stack trace occur frequently in Nova compute.log when running test suites. The exception seems to occur when instances are terminated. I've seen it both on SmokeStack and again today it happened to me when running the latest Grizzly trunk code manually to investigate the issue further (Git Hash 36b8525).

2012-10-09 13:24:56 AUDIT nova.compute.manager [req-4909f79a-f35a-4590-88fc-5a94ad6fc9fc aff1b54efbbf4a56b88d6d92790190d0 df64b634751947d5b0fbd813f813d5e7] [instance: 5420df8c-b1ad-4373-bc6f-c2088325ad89] Terminating instance
2012-10-09 13:25:00 ERROR nova.manager [-] Error during ComputeManager.update_available_resource: Unable to read from monitor: Connection reset by peer
2012-10-09 13:25:00 TRACE nova.manager Traceback (most recent call last):
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/manager.py", line 175, in periodic_tasks
2012-10-09 13:25:00 TRACE nova.manager task(self, context)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2804, in update_available_resource
2012-10-09 13:25:00 TRACE nova.manager self.resource_tracker.update_available_resource(context)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/utils.py", line 760, in inner
2012-10-09 13:25:00 TRACE nova.manager retval = f(*args, **kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 390, in update_available_resource
2012-10-09 13:25:00 TRACE nova.manager resources = self.driver.get_available_resource()
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2220, in get_available_resource
2012-10-09 13:25:00 TRACE nova.manager 'disk_available_least': self.get_disk_available_least()}
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2712, in get_disk_available_least
2012-10-09 13:25:00 TRACE nova.manager self.get_instance_disk_info(i_name))
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2659, in get_instance_disk_info
2012-10-09 13:25:00 TRACE nova.manager xml = virt_dom.XMLDesc(0)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 187, in doit
2012-10-09 13:25:00 TRACE nova.manager result = proxy_call(self._autowrap, f, *args, **kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 147, in proxy_call
2012-10-09 13:25:00 TRACE nova.manager rv = execute(f,*args,**kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 76, in tworker
2012-10-09 13:25:00 TRACE nova.manager rv = meth(*args,**kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib64/python2.7/site-packages/libvirt.py", line 381, in XMLDesc
2012-10-09 13:25:00 TRACE nova.manager if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
2012-10-09 13:25:00 TRACE nova.manager libvirtError: Unable to read from monitor: Connection reset by peer
2012-10-09 13:25:00 TRACE nova.manager
2012-10-09 13:25:00 INFO nova.virt.libvirt.driver [-] [instance: 5420df8c-b1ad-4373-bc6f-c2088325ad89] Instance destroyed successfully.

Dan Prince (dan-prince) on 2012-10-09

Changed in nova:
importance:	Undecided → High

Revision history for this message

Daniel Berrange (berrange) wrote on 2012-10-11:

> 2012-10-09 13:25:00 TRACE nova.manager libvirtError: Unable to read from monitor: Connection reset by peer

This indicates that QEMU shutdown while libvirt was talking to its monitor, to fullfill the virDomainGetXMLDesc() API call.

This obviously happened because another thread was destroying that instance while the stats collection thread was working.

I guess the stats collection thread needs to deal with the possibility that a VM can disappear while its talking to it instead of throwing the exception

Russell Bryant (russellb) on 2012-11-01

Changed in nova:
status:	New → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-01-09: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/19287

Changed in nova:
assignee:	nobody → wangpan (hzwangpan)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-01-23: Fix merged to nova (master)

Reviewed: https://review.openstack.org/19287
Committed: http://github.com/openstack/nova/commit/67376d34ed6bf68b15ac87ef444887cbc27dc6b0
Submitter: Jenkins
Branch: master

commit 67376d34ed6bf68b15ac87ef444887cbc27dc6b0
Author: Wangpan <email address hidden>
Date: Wed Jan 9 19:33:43 2013 +0800

Map libvirt error to InstanceNotFound in get_instance_disk_info

    When getting instance disk info, the instance may be destroyed/deleted, and a
    libvirtError will be raised in XMLDesc method, so catching and mapping it to
    InstanceNotFound and the caller can handle it correctly.

Fixes bug #1064581

Change-Id: I07fed3e82e10dad4cb84ae5c8650ada351c24e78