I'm seeing the following stack trace occur frequently in Nova compute.log when running test suites. The exception seems to occur when instances are terminated. I've seen it both on SmokeStack and again today it happened to me when running the latest Grizzly trunk code manually to investigate the issue further (Git Hash 36b8525).
2012-10-09 13:24:56 AUDIT nova.compute.manager [req-4909f79a-f35a-4590-88fc-5a94ad6fc9fc aff1b54efbbf4a56b88d6d92790190d0 df64b634751947d5b0fbd813f813d5e7] [instance: 5420df8c-b1ad-4373-bc6f-c2088325ad89] Terminating instance
2012-10-09 13:25:00 ERROR nova.manager [-] Error during ComputeManager.update_available_resource: Unable to read from monitor: Connection reset by peer
2012-10-09 13:25:00 TRACE nova.manager Traceback (most recent call last):
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/manager.py", line 175, in periodic_tasks
2012-10-09 13:25:00 TRACE nova.manager task(self, context)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2804, in update_available_resource
2012-10-09 13:25:00 TRACE nova.manager self.resource_tracker.update_available_resource(context)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/utils.py", line 760, in inner
2012-10-09 13:25:00 TRACE nova.manager retval = f(*args, **kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 390, in update_available_resource
2012-10-09 13:25:00 TRACE nova.manager resources = self.driver.get_available_resource()
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2220, in get_available_resource
2012-10-09 13:25:00 TRACE nova.manager 'disk_available_least': self.get_disk_available_least()}
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2712, in get_disk_available_least
2012-10-09 13:25:00 TRACE nova.manager self.get_instance_disk_info(i_name))
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2659, in get_instance_disk_info
2012-10-09 13:25:00 TRACE nova.manager xml = virt_dom.XMLDesc(0)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 187, in doit
2012-10-09 13:25:00 TRACE nova.manager result = proxy_call(self._autowrap, f, *args, **kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 147, in proxy_call
2012-10-09 13:25:00 TRACE nova.manager rv = execute(f,*args,**kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 76, in tworker
2012-10-09 13:25:00 TRACE nova.manager rv = meth(*args,**kwargs)
2012-10-09 13:25:00 TRACE nova.manager File "/usr/lib64/python2.7/site-packages/libvirt.py", line 381, in XMLDesc
2012-10-09 13:25:00 TRACE nova.manager if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
2012-10-09 13:25:00 TRACE nova.manager libvirtError: Unable to read from monitor: Connection reset by peer
2012-10-09 13:25:00 TRACE nova.manager
2012-10-09 13:25:00 INFO nova.virt.libvirt.driver [-] [instance: 5420df8c-b1ad-4373-bc6f-c2088325ad89] Instance destroyed successfully.
> 2012-10-09 13:25:00 TRACE nova.manager libvirtError: Unable to read from monitor: Connection reset by peer
This indicates that QEMU shutdown while libvirt was talking to its monitor, to fullfill the virDomainGetXML Desc() API call.
This obviously happened because another thread was destroying that instance while the stats collection thread was working.
I guess the stats collection thread needs to deal with the possibility that a VM can disappear while its talking to it instead of throwing the exception