ceilometer-rabbitmq-failover

Bug #1503363 reported by Michael Petersen
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
New
High
MOS Ceilometer
6.1.x
Confirmed
High
MOS Ceilometer
7.0.x
New
Undecided
MOS Ceilometer
8.0.x
New
Undecided
MOS Ceilometer
Future
New
Undecided
MOS Ceilometer

Bug Description

Fuel - 6.1

The non-graceful failover of RabbitMQ connections in Ceilometer causes issues in active environments when Ceilometer cannot connect to the next RabbitMQ system in line. This was found in a customer environment. When trying to reproduce:

Kill RabbitMQ on the master node
Tail the Ceilometer logs to watch connections to the next RabbitMQ cluster. It does't look like it actually does this correctly and it waits for the original cluster to come back up.

The Cluster ends up coming back up with the original node and it starts working, however if the node did not come back up it looks like it would continue to have issues connecting to RabbitMQ servers on the other nodes.

Logs:

2015-10-06 16:28:45.706 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.5:5673 closed the connection. Check login credentials: Socket closed
2015-10-06 16:28:46.412 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.5:5673 closed the connection. Check login credentials: Socket closed
2015-10-06 16:28:46.459 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.5:5673 closed the connection. Check login credentials: Socket closed
2015-10-06 16:28:48.089 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.5:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:28:48.335 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.5:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:28:48.337 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.5:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:28:49.538 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.5:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:28:49.704 5914 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.168.0.7:5673
2015-10-06 16:28:49.727 5914 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.168.0.7:5673
2015-10-06 16:28:50.555 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.5:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 2 seconds.
2015-10-06 16:28:52.593 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.5:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:28:53.698 5914 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.168.0.4:5673
2015-10-06 16:29:55.229 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.4:5673 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 1 seconds.
2015-10-06 16:29:56.415 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.4:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:29:57.555 5914 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.168.0.7:5673
2015-10-06 16:32:07.102 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.7:5673 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 1 seconds.
2015-10-06 16:32:07.314 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.7:5673 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 1 seconds.
2015-10-06 16:32:08.238 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.7:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:32:08.298 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.7:5673 is unreachable: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'. Trying again in 1 seconds.
2015-10-06 16:32:08.337 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.7:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:32:09.256 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.4:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:32:09.313 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.7:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:32:09.420 5914 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.168.0.5:5673
2015-10-06 16:32:10.270 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.4:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 2 seconds.
2015-10-06 16:32:10.330 5914 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.4:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2015-10-06 16:32:11.357 5914 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.168.0.5:5673
2015-10-06 16:32:12.291 5914 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.168.0.5:5673

tags: added: customer-found
Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

maybe related to https://bugs.launchpad.net/mos/+bug/1393505

I got my rabbitmq restarted once again, and now what I saw:

2015-10-06 20:13:27.650 27487 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.26.51:5673
2015-10-06 20:13:27.663 27487 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 192.168.26.71:5673 closed the connection. Check login credentials: So
cket closed
2015-10-06 20:13:27.778 27484 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-06 20:13:27.906 27483 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-06 20:13:27.923 27486 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.26.51:5673
2015-10-06 20:13:27.939 27486 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.26.71:5673 is unreachable: timed out. Trying again in 30 s
econds.
2015-10-06 20:13:27.944 27482 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.26.71:5673
2015-10-06 20:13:27.952 27492 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-06 20:13:27.957 27482 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 192.168.26.51:5673 closed the connection. Check login credentials: So
cket closed
2015-10-06 20:13:28.128 27481 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.26.51:5673
2015-10-06 20:13:28.148 27481 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 192.168.26.51:5673 closed the connection. Check login credentials: So
cket closed
2015-10-06 20:13:28.258 27491 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.26.70:5673
2015-10-06 20:13:28.270 27491 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 192.168.26.51:5673 closed the connection. Check login credentials: So
cket closed

Revision history for this message
Vadim Rovachev (vrovachev) wrote :

Please , specify fuel version.

Changed in fuel:
assignee: nobody → MOS Ceilometer (mos-ceilometer)
importance: Undecided → High
tags: added: ceilometer
Changed in fuel:
status: New → Incomplete
Revision history for this message
Michael Petersen (mpetason) wrote :

The Fuel Version in question is 6.1.

description: updated
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

So, looks like we have answer on the original question, status changed to Confirmed.

Changed in fuel:
milestone: none → 6.1-updates
status: Incomplete → Confirmed
Revision history for this message
Nadya Privalova (nprivalova) wrote :

The bug is already known https://bugs.launchpad.net/mos/+bug/1510916 . Please close this one as a duplicate

Changed in fuel:
milestone: 6.1-updates → 9.0
status: Confirmed → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.