1-> Did rabbitmq restart anytime.
2-> Network connectivity to RMQ from nova-compute. Was it stable?
3-> Are they running HA.
I see the following in the log message
2015-03-24 18:35:14.320 2972 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID 446d5968d5ff469ea71c84a85d9f2b6d
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/openstack/common/periodic_task.py", line 182, in run_periodic_tasks
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task task(self, context)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5460, in update_available_resource
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task rt.update_available_resource(context)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/openstack/common/lockutils.py", line 249, in inner
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task return f(*args, **kwargs)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 315, in update_available_resource
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task context, self.host, self.nodename)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/objects/base.py", line 110, in wrapper
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task args, kwargs)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/nova/conductor/rpcapi.py", line 425, in object_class_action
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task objver=objver, args=args, kwargs=kwargs)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 150, in call
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task wait_for_reply=True, timeout=timeout)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task timeout=timeout)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task return self._send(target, ctxt, message, wait_for_reply, timeout)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task result = self._waiter.wait(msg_id, timeout)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 280, in wait
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task reply, ending, trylock = self._poll_queue(msg_id, timeout)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 220, in _poll_queue
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task message = self.waiters.get(msg_id, timeout)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 126, in get
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task 'to message ID %s' % msg_id)
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID
This could be a reason the compute was masked as not happy.
Could you please ask them to attach /var/log/contrail/ha/rmq-monitor.log. This will help us in checking if RMQ was stable.
Thanks,
Sanju
From: Nagabhushana R <email address hidden>
Date: Wednesday, April 1, 2015 11:54 PM
To: Sanju Abraham <email address hidden>
Subject: Fwd: [Bug 1439145] [NEW] nova-compute status is "XXX"in nova-manage service list
We need to know few things
1-> Did rabbitmq restart anytime.
2-> Network connectivity to RMQ from nova-compute. Was it stable?
3-> Are they running HA.
I see the following in the log message
2015-03-24 18:35:14.320 2972 ERROR nova.openstack. common. periodic_ task [-] Error during ComputeManager. update_ available_ resource: Timed out waiting for a reply to message ID 446d5968d5ff469 ea71c84a85d9f2b 6d common. periodic_ task Traceback (most recent call last): common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ nova/openstack/ common/ periodic_ task.py" , line 182, in run_periodic_tasks common. periodic_ task task(self, context) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ nova/compute/ manager. py", line 5460, in update_ available_ resource common. periodic_ task rt.update_ available_ resource( context) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ nova/openstack/ common/ lockutils. py", line 249, in inner common. periodic_ task return f(*args, **kwargs) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ nova/compute/ resource_ tracker. py", line 315, in update_ available_ resource common. periodic_ task context, self.host, self.nodename) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ nova/objects/ base.py" , line 110, in wrapper common. periodic_ task args, kwargs) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ nova/conductor/ rpcapi. py", line 425, in object_class_action common. periodic_ task objver=objver, args=args, kwargs=kwargs) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ oslo/messaging/ rpc/client. py", line 150, in call common. periodic_ task wait_for_ reply=True, timeout=timeout) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ oslo/messaging/ transport. py", line 90, in _send common. periodic_ task timeout=timeout) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ oslo/messaging/ _drivers/ amqpdriver. py", line 412, in send common. periodic_ task return self._send(target, ctxt, message, wait_for_reply, timeout) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ oslo/messaging/ _drivers/ amqpdriver. py", line 403, in _send common. periodic_ task result = self._waiter. wait(msg_ id, timeout) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ oslo/messaging/ _drivers/ amqpdriver. py", line 280, in wait common. periodic_ task reply, ending, trylock = self._poll_ queue(msg_ id, timeout) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ oslo/messaging/ _drivers/ amqpdriver. py", line 220, in _poll_queue common. periodic_ task message = self.waiters. get(msg_ id, timeout) common. periodic_ task File "/usr/lib/ python2. 7/dist- packages/ oslo/messaging/ _drivers/ amqpdriver. py", line 126, in get common. periodic_ task 'to message ID %s' % msg_id) common. periodic_ task MessagingTimeout: Timed out waiting for a reply to message ID
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
2015-03-24 18:35:14.320 2972 TRACE nova.openstack.
This could be a reason the compute was masked as not happy.
Could you please ask them to attach /var/log/ contrail/ ha/rmq- monitor. log. This will help us in checking if RMQ was stable.
Thanks,
Sanju
From: Nagabhushana R <email address hidden>
Date: Wednesday, April 1, 2015 11:54 PM
To: Sanju Abraham <email address hidden>
Subject: Fwd: [Bug 1439145] [NEW] nova-compute status is "XXX"in nova-manage service list
would you know more on this…?