Ceilometer agent compute cannot reconnect to rabbitmq after RabbitMQ failover

Bug #1552779 reported by Dmitry Sutyagin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Dmitry Sutyagin
6.1.x
Invalid
High
Dmitry Sutyagin

Bug Description

After RabbitMq failover the following error is observed in log:

node-3:/var/log/ceilometer# date
Thu Mar 3 16:12:34 UTC 2016
node-3:/var/log/ceilometer# tail -n 20 ceilometer-agent-compute.log
2016-03-02 03:38:01.240 17372 INFO ceilometer.agent [-] Polling pollster network.incoming.bytes in the context of network_source
2016-03-02 03:38:01.347 17372 INFO ceilometer.agent [-] Polling pollster memory.usage in the context of meter_source
2016-03-02 03:38:01.357 17372 INFO ceilometer.agent [-] Polling pollster instance in the context of meter_source
2016-03-02 03:38:01.461 17372 INFO ceilometer.agent [-] Polling pollster network.outgoing.packets in the context of meter_source
2016-03-02 03:39:00.040 17372 ERROR oslo.messaging._drivers.impl_rabbit [-] (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 920, in _heartbeat_predicate
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit conn.drain_events(timeout=0.01)
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 623, in drain_events
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit self.connection.drain_events(timeout=timeout)
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 279, in drain_events
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs)
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 91, in drain_events
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs)
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 320, in drain_events
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit return amqp_method(channel, args)
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 526, in _close
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit (class_id, method_id), ConnectionError)
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit ConnectionForced: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
2016-03-02 03:39:00.040 17372 TRACE oslo.messaging._drivers.impl_rabbit

No more new messages after that, though the service is running,
Strace shows that service is constantly calling epoll_wait:

node-3:/var/log/ceilometer# strace -p $(pgrep -f ceilometer-agent-compute)
Process 17372 attached - interrupt to quit
epoll_wait(5, {}, 1023, 26) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 49) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0
epoll_wait(5, {}, 1023, 0) = 0

Changed in mos:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Dmitry Sutyagin (dsutyagin) wrote :

Best I could do in terms of getting a traceback:

(gdb) bt
#0 0x00007f127bd64f82 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000512b8e in ?? ()
#2 0x00000000004b5d01 in PyEval_EvalFrameEx ()
#3 0x00000000004b6257 in PyEval_EvalFrameEx ()
#4 0x00000000004bc463 in PyEval_EvalCodeEx ()
#5 0x00000000004b645b in PyEval_EvalFrameEx ()
#6 0x00000000004bc463 in PyEval_EvalCodeEx ()
#7 0x00000000004491df in ?? ()
#8 0x000000000041b10a in PyObject_Call ()
#9 0x00000000004306be in ?? ()
#10 0x000000000041b10a in PyObject_Call ()
#11 0x00000000004b54d6 in PyEval_CallObjectWithKeywords ()
#12 0x00007f127b652a66 in ?? () from /usr/lib/python2.7/dist-packages/greenlet.so
#13 0x00007f127b6523b0 in ?? () from /usr/lib/python2.7/dist-packages/greenlet.so
#14 0x00007f127b652f36 in ?? () from /usr/lib/python2.7/dist-packages/greenlet.so
#15 0x00000000004b5d01 in PyEval_EvalFrameEx ()
#16 0x00000000004b6257 in PyEval_EvalFrameEx ()
#17 0x00000000004bc463 in PyEval_EvalCodeEx ()
#18 0x00000000004b645b in PyEval_EvalFrameEx ()
#19 0x00000000004b6257 in PyEval_EvalFrameEx ()
#20 0x00000000004b6257 in PyEval_EvalFrameEx ()
#21 0x00000000004b6257 in PyEval_EvalFrameEx ()
#22 0x00000000004b6257 in PyEval_EvalFrameEx ()
#23 0x00000000004b6257 in PyEval_EvalFrameEx ()
#24 0x00000000004bc463 in PyEval_EvalCodeEx ()
#25 0x00000000004b645b in PyEval_EvalFrameEx ()
#26 0x00000000004bc463 in PyEval_EvalCodeEx ()
#27 0x00000000004b645b in PyEval_EvalFrameEx ()
#28 0x00000000004b6257 in PyEval_EvalFrameEx ()
#29 0x00000000004bc463 in PyEval_EvalCodeEx ()
#30 0x00000000004bcf12 in PyEval_EvalCode ()
#31 0x00000000004dc202 in ?? ()
#32 0x00000000004dcdf4 in PyRun_FileExFlags ()
#33 0x00000000004dd8fe in PyRun_SimpleFileExFlags ()
#34 0x00000000004ee202 in Py_Main ()
#35 0x00007f127bc9276d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#36 0x000000000041cbd9 in _start ()

Not really useful I guess.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

I am afraid that the provided info is not enough to diagnose the issue. But my best guess is that this is a duplicate of https://bugs.launchpad.net/fuel/+bug/1496000 , fix for which was merged into 6.0-updates a month prior to the time current issue was filed.

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Setting to Incomlete and assigning to Dmitry Sutyagin (reporter).

Dmitry - https://bugs.launchpad.net/fuel/+bug/1496000 mentioned in comment #2 is fixed for 6.0 and 6.1. Is there some other issue or this bug could be closed as duplicate?

Changed in mos:
status: Confirmed → Incomplete
assignee: MOS Ceilometer (mos-ceilometer) → Dmitry Sutyagin (dsutyagin)
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Invalid as it stays in Incomplete for more than a month

Changed in mos:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.