periodic tasks will be invalid if a qemu process becomes to defunct status

Bug #1270008 reported by wangpan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
wangpan
Havana
Fix Released
Medium
wangpan

Bug Description

I am using stable havana nova.
I got this exception while I delete my kvm instance, but the qemu process of this instance become to 'defunct' status by some unknown reason(may be a qemu/kvm bug), and then the periodic task stopped unexpectly everytime, then the resources of this compute node will never be reported, because of this exception below, I think we should handle this exception while running periodic task.
2014-01-16 15:53:28.421 47954 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: cannot get CPU affinity of process 62279: No such process
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/openstack/common/periodic_task.py", line 180, in run_periodic_tasks
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task task(self, context)
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5617, in update_available_resource
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task rt.update_available_resource(context)
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/openstack/common/lockutils.py", line 246, in inner
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task return f(*args, **kwargs)
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 281, in update_available_resource
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task resources = self.driver.get_available_resource(self.nodename)
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4275, in get_available_resource
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task stats = self.host_state.get_host_stats(refresh=True)
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5350, in get_host_stats
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task self.update_status()
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5386, in update_status
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task data["vcpus_used"] = self.driver.get_vcpu_used()
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3949, in get_vcpu_used
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task vcpus = dom.vcpus()
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 179, in doit
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task result = proxy_call(self._autowrap, f, *args, **kwargs)
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 139, in proxy_call
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task rv = execute(f,*args,**kwargs)
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 77, in tworker
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task rv = meth(*args,**kwargs)
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/libvirt.py", line 2222, in vcpus
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task if ret == -1: raise libvirtError ('virDomainGetVcpus() failed', dom=self)
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task libvirtError: cannot get CPU affinity of process 62279: No such process
2014-01-16 15:53:28.421 47954 TRACE nova.openstack.common.periodic_task

and the exception while I delete this instance:
2014-01-16 15:13:26.640 47954 ERROR nova.openstack.common.rpc.amqp [req-03ed9463-0740-4423-bf1e-2334ed29ee5c 9537af4d80e546409b670673f9a81388 3179fc9d69d747b4a06f27a6d2334050] Exception during message handling
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 461, in _process_data
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp **args)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 400, in decorated_function
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 90, in wrapped
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp payload)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 73, in wrapped
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 290, in decorated_function
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp pass
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 276, in decorated_function
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 341, in decorated_function
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp function(self, context, *args, **kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 318, in decorated_function
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 305, in decorated_function
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2128, in terminate_instance
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp do_terminate_instance(instance, bdms, clean_shutdown)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/lockutils.py", line 246, in inner
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp return f(*args, **kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2120, in do_terminate_instance
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp reservations=reservations)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 105, in inner
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp rv = f(*args, **kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2091, in _delete_instance
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp user_id=user_id)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2063, in _delete_instance
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp clean_shutdown=clean_shutdown)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1984, in _shutdown_instance
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp requested_networks)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1974, in _shutdown_instance
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp context=context, clean_shutdown=clean_shutdown)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 883, in destroy
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp self._destroy(instance)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 838, in _destroy
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp instance=instance)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 810, in _destroy
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp virt_dom.destroy()
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 179, in doit
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp result = proxy_call(self._autowrap, f, *args, **kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 139, in proxy_call
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp rv = execute(f,*args,**kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 77, in tworker
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp rv = meth(*args,**kwargs)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/libvirt.py", line 760, in destroy
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp if ret == -1: raise libvirtError ('virDomainDestroy() failed', dom=self)
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp libvirtError: Failed to terminate process 61945 with SIGKILL: Device or resource busy
2014-01-16 15:13:26.640 47954 TRACE nova.openstack.common.rpc.amqp

wangpan (hzwangpan)
description: updated
description: updated
Revision history for this message
wangpan (hzwangpan) wrote :
Changed in nova:
assignee: nobody → wangpan (hzwangpan)
Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/67361
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=844df860c38ac38550b8d1739fd53131cd7fd864
Submitter: Jenkins
Branch: master

commit 844df860c38ac38550b8d1739fd53131cd7fd864
Author: Wangpan <email address hidden>
Date: Fri Jan 17 10:35:12 2014 +0800

    libvirt: handle exception while get vcpu info

    If an exception is raised while get a libvirt domain's vcpu info,
    the update_available_resource periodic task will be failed, which
    will result in the resource of this compute node will never be
    reported.

    This patch add an exception handling to avoid this situation.
    Closes-bug: #1270008

    Change-Id: I69109402416989fdaa421f8dbc72953bd067c407

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
importance: Undecided → Medium
tags: added: havana-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/72129

Changed in nova:
milestone: none → icehouse-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/72129
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=83926a0022bbb0b3a68e0a91f7aa0acb1b9cc969
Submitter: Jenkins
Branch: stable/havana

commit 83926a0022bbb0b3a68e0a91f7aa0acb1b9cc969
Author: Wangpan <email address hidden>
Date: Fri Jan 17 10:35:12 2014 +0800

    libvirt: handle exception while get vcpu info

    If an exception is raised while get a libvirt domain's vcpu info,
    the update_available_resource periodic task will be failed, which
    will result in the resource of this compute node will never be
    reported.

    This patch add an exception handling to avoid this situation.
    Closes-bug: #1270008

    Change-Id: I69109402416989fdaa421f8dbc72953bd067c407
    (cherry picked from commit 844df860c38ac38550b8d1739fd53131cd7fd864)

tags: added: in-stable-havana
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-3 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.