After a heat stack-update, all nova-compute services are down.
The problem exist in both 3 controller+n compute and 1 controller + n compute.
The overcloud image was built off trunk on 10/20/2014
Nova conductor log:
Oct 27 20:10:48 ci-overcloud-controller0-kbteq7w5rqla nova-conductor: 2014-10-27 20:10:48.992 28817 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
Oct 27 20:10:49 ci-overcloud-controller0-kbteq7w5rqla nova-conductor: 2014-10-27 20:10:49.248 28813 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
Oct 27 20:10:49 ci-overcloud-controller0-kbteq7w5rqla nova-conductor: 2014-10-27 20:10:49.992 28817 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 162.3.121.200:5672
Oct 27 20:10:50 ci-overcloud-controller0-kbteq7w5rqla nova-conductor: 2014-10-27 20:10:50.248 28813 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 162.3.121.200:5672
Oct 27 20:11:00 ci-overcloud-controller0-kbteq7w5rqla nova-conductor: 2014-10-27 20:11:00.119 28816 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 162.3.121.200:5672 closed the connection. Check login credentials: Socket closed
Oct 27 20:11:01 ci-overcloud-controller0-kbteq7w5rqla nova-conductor: 2014-10-27 20:11:01.143 28822 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 162.3.121.200:5672 closed the connection. Check login credentials: Socket closed
Oct 27 20:11:03 ci-overcloud-controller0-kbteq7w5rqla nova-conductor: 2014-10-27 20:11:03.447 28817 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 162.3.121.200:5672 closed the connection. Check login credentials: Socket closed
nova compute log:
Oct 27 20:01:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:01:36.509 22247 TRACE nova.openstack.common.periodic_task
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.570 22247 WARNING nova.openstack.common.loopingcall [-] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbDriver object at 0x7f454c8db710>> run outlasted interval by 110.06 sec
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager._run_pending_deletes: Timed out waiting for a reply to message ID 9c691f820eb648d6ac58d9107a4fcf27
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py", line 198, in run_periodic_tasks
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task task(self, context)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 6205, in _run_pending_deletes
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task context, filters, expected_attrs=attrs, use_slave=True)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/objects/base.py", line 153, in wrapper
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task args, kwargs)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 346, in object_class_action
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task objver=objver, args=args, kwargs=kwargs)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 152, in call
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task retry=self.retry)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task timeout=timeout, retry=retry)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 408, in send
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task retry=retry)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 397, in _send
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task result = self._waiter.wait(msg_id, timeout)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 298, in wait
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task reply, ending, trylock = self._poll_queue(msg_id, timeout)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 238, in _poll_queue
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task message = self.waiters.get(msg_id, timeout)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 144, in get
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task 'to message ID %s' % msg_id)
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID 9c691f820eb648d6ac58d9107a4fcf27
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.571 22247 TRACE nova.openstack.common.periodic_task
Oct 27 20:02:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:02:36.573 22247 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 162.3.121.200:5672
Oct 27 20:03:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:03:36.639 22247 WARNING nova.openstack.common.loopingcall [-] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbDriver object at 0x7f454c8db710>> run outlasted interval by 50.07 sec
Oct 27 20:04:36 ci-overcloud-novacompute0-cdfjqk62jmo3 nova-compute: 2014-10-27 20:04:36.702 22247 WARNING nova.openstack.common.loopingcall
After changing nova.conf to connect conductor directly to rabbitmq server without haproxy, everything would be back.
Hi Jerry, can you provide logs from the rabbitmq server? I wonder if haproxy is confusing rabbitmq-server somehow. I believe they'd be in /var/log/rabbitmq. Thanks.