ceilometer-rabbitmq-failover
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
New
|
High
|
MOS Ceilometer | ||
6.1.x |
Confirmed
|
High
|
MOS Ceilometer | ||
7.0.x |
New
|
Undecided
|
MOS Ceilometer | ||
8.0.x |
New
|
Undecided
|
MOS Ceilometer | ||
Future |
New
|
Undecided
|
MOS Ceilometer |
Bug Description
Fuel - 6.1
The non-graceful failover of RabbitMQ connections in Ceilometer causes issues in active environments when Ceilometer cannot connect to the next RabbitMQ system in line. This was found in a customer environment. When trying to reproduce:
Kill RabbitMQ on the master node
Tail the Ceilometer logs to watch connections to the next RabbitMQ cluster. It does't look like it actually does this correctly and it waits for the original cluster to come back up.
The Cluster ends up coming back up with the original node and it starts working, however if the node did not come back up it looks like it would continue to have issues connecting to RabbitMQ servers on the other nodes.
Logs:
2015-10-06 16:28:45.706 5914 ERROR oslo_messaging.
2015-10-06 16:28:46.412 5914 ERROR oslo_messaging.
2015-10-06 16:28:46.459 5914 ERROR oslo_messaging.
2015-10-06 16:28:48.089 5914 ERROR oslo_messaging.
2015-10-06 16:28:48.335 5914 ERROR oslo_messaging.
2015-10-06 16:28:48.337 5914 ERROR oslo_messaging.
2015-10-06 16:28:49.538 5914 ERROR oslo_messaging.
2015-10-06 16:28:49.704 5914 INFO oslo_messaging.
2015-10-06 16:28:49.727 5914 INFO oslo_messaging.
2015-10-06 16:28:50.555 5914 ERROR oslo_messaging.
2015-10-06 16:28:52.593 5914 ERROR oslo_messaging.
2015-10-06 16:28:53.698 5914 INFO oslo_messaging.
2015-10-06 16:29:55.229 5914 ERROR oslo_messaging.
2015-10-06 16:29:56.415 5914 ERROR oslo_messaging.
2015-10-06 16:29:57.555 5914 INFO oslo_messaging.
2015-10-06 16:32:07.102 5914 ERROR oslo_messaging.
2015-10-06 16:32:07.314 5914 ERROR oslo_messaging.
2015-10-06 16:32:08.238 5914 ERROR oslo_messaging.
2015-10-06 16:32:08.298 5914 ERROR oslo_messaging.
2015-10-06 16:32:08.337 5914 ERROR oslo_messaging.
2015-10-06 16:32:09.256 5914 ERROR oslo_messaging.
2015-10-06 16:32:09.313 5914 ERROR oslo_messaging.
2015-10-06 16:32:09.420 5914 INFO oslo_messaging.
2015-10-06 16:32:10.270 5914 ERROR oslo_messaging.
2015-10-06 16:32:10.330 5914 ERROR oslo_messaging.
2015-10-06 16:32:11.357 5914 INFO oslo_messaging.
2015-10-06 16:32:12.291 5914 INFO oslo_messaging.
tags: | added: customer-found |
Changed in fuel: | |
status: | New → Incomplete |
Changed in fuel: | |
milestone: | 6.1-updates → 9.0 |
status: | Confirmed → New |
maybe related to https:/ /bugs.launchpad .net/mos/ +bug/1393505
I got my rabbitmq restarted once again, and now what I saw:
2015-10-06 20:13:27.650 27487 INFO oslo.messaging. _drivers. impl_rabbit [-] Connecting to AMQP server on 192.168.26.51:5673 _drivers. impl_rabbit [-] AMQP server 192.168.26.71:5673 closed the connection. Check login credentials: So _drivers. impl_rabbit [-] Delaying reconnect for 1.0 seconds ... _drivers. impl_rabbit [-] Delaying reconnect for 1.0 seconds ... _drivers. impl_rabbit [-] Connecting to AMQP server on 192.168.26.51:5673 _drivers. impl_rabbit [-] AMQP server on 192.168.26.71:5673 is unreachable: timed out. Trying again in 30 s _drivers. impl_rabbit [-] Connecting to AMQP server on 192.168.26.71:5673 _drivers. impl_rabbit [-] Delaying reconnect for 1.0 seconds ... _drivers. impl_rabbit [-] AMQP server 192.168.26.51:5673 closed the connection. Check login credentials: So _drivers. impl_rabbit [-] Connecting to AMQP server on 192.168.26.51:5673 _drivers. impl_rabbit [-] AMQP server 192.168.26.51:5673 closed the connection. Check login credentials: So _drivers. impl_rabbit [-] Connecting to AMQP server on 192.168.26.70:5673 _drivers. impl_rabbit [-] AMQP server 192.168.26.51:5673 closed the connection. Check login credentials: So
2015-10-06 20:13:27.663 27487 ERROR oslo.messaging.
cket closed
2015-10-06 20:13:27.778 27484 INFO oslo.messaging.
2015-10-06 20:13:27.906 27483 INFO oslo.messaging.
2015-10-06 20:13:27.923 27486 INFO oslo.messaging.
2015-10-06 20:13:27.939 27486 ERROR oslo.messaging.
econds.
2015-10-06 20:13:27.944 27482 INFO oslo.messaging.
2015-10-06 20:13:27.952 27492 INFO oslo.messaging.
2015-10-06 20:13:27.957 27482 ERROR oslo.messaging.
cket closed
2015-10-06 20:13:28.128 27481 INFO oslo.messaging.
2015-10-06 20:13:28.148 27481 ERROR oslo.messaging.
cket closed
2015-10-06 20:13:28.258 27491 INFO oslo.messaging.
2015-10-06 20:13:28.270 27491 ERROR oslo.messaging.
cket closed