More useful is the stack trace of the original error, rather than the RPC error
2014-08-07 03:31:46.400 ERROR nova.virt.libvirt.driver [req-39ce7a3d-5ceb-41f5-8f9f-face7e608bd1 ServerRescueTestJSON-2035684545 ServerRescueTestJSON-1017508309] [instance: 0573094d-53da-40a5-948a-747d181462f5] Error from libvirt during destroy. Code=38 Error=Failed to terminate process 26425 with SIGKILL: Device or resource busy
2014-08-07 03:31:46.400 ERROR nova.compute.manager [req-39ce7a3d-5ceb-41f5-8f9f-face7e608bd1 ServerRescueTestJSON-2035684545 ServerRescueTestJSON-1017508309] [instance: 0573094d-53da-40a5-948a-747d181462f5] Error trying to Rescue Instance
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] Traceback (most recent call last):
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] File "/opt/stack/new/nova/nova/compute/manager.py", line 3088, in rescue_instance
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] rescue_image_meta, admin_password)
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2279, in rescue
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] self._destroy(instance)
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 936, in _destroy
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] instance=instance)
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 82, in __exit__
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] six.reraise(self.type_, self.value, self.tb)
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 908, in _destroy
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] virt_dom.destroy()
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 179, in doit
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] result = proxy_call(self._autowrap, f, *args, **kwargs)
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 139, in proxy_call
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] rv = execute(f,*args,**kwargs)
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 77, in tworker
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] rv = meth(*args,**kwargs)
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 918, in destroy
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] if ret == -1: raise libvirtError ('virDomainDestroy() failed', dom=self)
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5] libvirtError: Failed to terminate process 26425 with SIGKILL: Device or resource busy
2014-08-07 03:31:46.400 22829 TRACE nova.compute.manager [instance: 0573094d-53da-40a5-948a-747d181462f5]
So this is the 'rescue' method, attempting to kill off the existing running guest, before starting the guest in its rescued configuration.
When libvirt kills a process it sends it SIGTERM first and waits 10 seconds. If it hasn't gone it sends SIGKILL and waits another 5 seconds. If it still hasn't gone then you get this EBUSY error.
Usually when a QEMU process fails to go away upon SIGKILL it is because it is stuck in an uninterruptable kernel sleep waiting on I/O from some non-responsive server.
Given the CPU load of the gate tests though, it is conceivable that the 15 second timeout is too short, particularly if the VM running tempest has a high steal time from the cloud host. ie 15 wallclock seconds may have passed, but the VM might have only have a few seconds of scheduled run time.
More useful is the stack trace of the original error, rather than the RPC error
2014-08-07 03:31:46.400 ERROR nova.virt. libvirt. driver [req-39ce7a3d- 5ceb-41f5- 8f9f-face7e608b d1 ServerRescueTes tJSON-203568454 5 ServerRescueTes tJSON-101750830 9] [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] Error from libvirt during destroy. Code=38 Error=Failed to terminate process 26425 with SIGKILL: Device or resource busy manager [req-39ce7a3d- 5ceb-41f5- 8f9f-face7e608b d1 ServerRescueTes tJSON-203568454 5 ServerRescueTes tJSON-101750830 9] [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] Error trying to Rescue Instance manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] Traceback (most recent call last): manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] File "/opt/stack/ new/nova/ nova/compute/ manager. py", line 3088, in rescue_instance manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] rescue_image_meta, admin_password) manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] File "/opt/stack/ new/nova/ nova/virt/ libvirt/ driver. py", line 2279, in rescue manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] self._destroy( instance) manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] File "/opt/stack/ new/nova/ nova/virt/ libvirt/ driver. py", line 936, in _destroy manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] instance=instance) manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] File "/opt/stack/ new/nova/ nova/openstack/ common/ excutils. py", line 82, in __exit__ manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] six.reraise( self.type_ , self.value, self.tb) manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] File "/opt/stack/ new/nova/ nova/virt/ libvirt/ driver. py", line 908, in _destroy manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] virt_dom.destroy() manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] File "/usr/lib/ python2. 7/dist- packages/ eventlet/ tpool.py" , line 179, in doit manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] result = proxy_call( self._autowrap, f, *args, **kwargs) manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] File "/usr/lib/ python2. 7/dist- packages/ eventlet/ tpool.py" , line 139, in proxy_call manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] rv = execute( f,*args, **kwargs) manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] File "/usr/lib/ python2. 7/dist- packages/ eventlet/ tpool.py" , line 77, in tworker manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] rv = meth(*args, **kwargs) manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] File "/usr/lib/ python2. 7/dist- packages/ libvirt. py", line 918, in destroy manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] if ret == -1: raise libvirtError ('virDomainDest roy() failed', dom=self) manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5] libvirtError: Failed to terminate process 26425 with SIGKILL: Device or resource busy manager [instance: 0573094d- 53da-40a5- 948a-747d181462 f5]
2014-08-07 03:31:46.400 ERROR nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
2014-08-07 03:31:46.400 22829 TRACE nova.compute.
So this is the 'rescue' method, attempting to kill off the existing running guest, before starting the guest in its rescued configuration.
When libvirt kills a process it sends it SIGTERM first and waits 10 seconds. If it hasn't gone it sends SIGKILL and waits another 5 seconds. If it still hasn't gone then you get this EBUSY error.
Usually when a QEMU process fails to go away upon SIGKILL it is because it is stuck in an uninterruptable kernel sleep waiting on I/O from some non-responsive server.
Given the CPU load of the gate tests though, it is conceivable that the 15 second timeout is too short, particularly if the VM running tempest has a high steal time from the cloud host. ie 15 wallclock seconds may have passed, but the VM might have only have a few seconds of scheduled run time.