Ceilometer collector cannot reconnect to rabbitmq after RabbitMQ failover

Bug #1510916 reported by Vitaly Gusev
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
MOS Ceilometer
5.1.x
In Progress
High
MOS Ceilometer
6.1.x
Fix Released
High
MOS Maintenance

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.0"
  api: "1.0"
  build_number: "58"
  build_id: "2014-12-26_14-25-46"
  astute_sha: "16b252d93be6aaa73030b8100cf8c5ca6a970a91"
  fuellib_sha: "fde8ba5e11a1acaf819d402c645c731af450aff0"
  ostf_sha: "a9afb68710d809570460c29d6c3293219d3624d4"
  nailgun_sha: "5f91157daa6798ff522ca9f6d34e7e135f150a90"
  fuelmain_sha: "81d38d6f2903b5a8b4bee79ca45a54b76c1361b8"

oslo.messaging version:
at 6.0 env:
root@node-2:~# apt-cache policy python-oslo.messaging
python-oslo.messaging:
  Installed: 1.4.1-fuel6.0~mira18

at 6.1 env (Verizon):
python-oslo.messaging_1.4.1-1~u14.04+mos13_all.deb

Steps to reproduce:
1. Deploy cluster with the following parameters:
    3 controllers+mongo, KVM, 5 GB RAM
    1 compute+ceph, Supermicro, 16 GB RAM
Sahara, Ceilometer enabled, Ceph for volumes, Ceph for images, Ceph for ephemeral volumes

2. Disable rabbitmq:
pcs resource disable master_p_rabbitmq-server
wait while master and slaves was stopped

3. Enable rabbitmq:
pcs resource enable master_p_rabbitmq-server
wait while master and slaves was started

Expected result:
Ceilometer collector successfully reconnected to rabbitmq

Actual result:
On all controller nodes in /var/log/ceilometer/ceilometer-collector.log we can see the following errors:

2015-10-28 11:43:01.113 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:02.115 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.0.3:5673
2015-10-28 11:43:02.123 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.3:5673 is unreachable: timed out. Trying again in 30 seconds.
2015-10-28 11:43:32.154 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:33.155 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 127.0.0.1:5673
2015-10-28 11:43:33.170 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.3:5673 closed the connection. Check login credentials: Socket closed

Metering queue in rabbit is not empty:
root@node-1:~# rabbitmqctl list_queues | grep metering
metering.sample 224
q-metering-plugin 0
q-metering-plugin.node-1 0
q-metering-plugin.node-2 0
q-metering-plugin.node-3 0
q-metering-plugin_fanout_1feb6540cda34d758611354495b98bfb 0
q-metering-plugin_fanout_64fb6a3997c44ecea91bbf617a8920d5 0
q-metering-plugin_fanout_bae52586cac2478b960d851b690f1494 0

After ~10 minutes collector on one controller reconnects to rabbitmq, but collectors on other two controllers don't.
For example, all ceilometer-agent-notifications succesfully reconnect to rabbitmq after rabbit restart.

Workaround: restart ceilometer-collector, after this collector succesfully connects to rabbitmq.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

This might be a known issue with oslo.messaging in 6.0.
Why file it if we have maintenance updates for 6.0 that fix the bug?

Changed in mos:
status: New → Incomplete
importance: High → Undecided
Revision history for this message
Nadya Privalova (nprivalova) wrote :

The issue is seen at 6.1 as well, that's why I believe that it is not repaired by 6.0-updates.

Changed in mos:
assignee: nobody → MOS Ceilometer (mos-ceilometer)
tags: added: ceilometer customer-found
Changed in mos:
importance: Undecided → High
description: updated
Revision history for this message
Nadya Privalova (nprivalova) wrote :

Confirmed by Dmitry Mescherekov

Changed in mos:
status: Incomplete → Confirmed
no longer affects: oslo.messaging
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/ceilometer (openstack-ci/fuel-6.0-updates/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.0-updates/2014.2
Change author: Nadya Shakhat <email address hidden>
Review: https://review.fuel-infra.org/13639

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/ceilometer (openstack-ci/fuel-6.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Nadya Shakhat <email address hidden>
Review: https://review.fuel-infra.org/13640

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Please don't change bug importance due to the fact it was found at customer site - just adding customer-found tag is enough. We would like to have Importance field to reflect actual impact of the issue.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/ceilometer (openstack-ci/fuel-5.1/2014.1.1)

Fix proposed to branch: openstack-ci/fuel-5.1/2014.1.1
Change author: Nadya Shakhat <email address hidden>
Review: https://review.fuel-infra.org/13808

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/ceilometer (openstack-ci/fuel-5.1-updates/2014.1.1)

Fix proposed to branch: openstack-ci/fuel-5.1-updates/2014.1.1
Change author: Nadya Shakhat <email address hidden>
Review: https://review.fuel-infra.org/13809

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/ceilometer (openstack-ci/fuel-5.1/2014.1.1)

Change abandoned by Nadya Shakhat <email address hidden> on branch: openstack-ci/fuel-5.1/2014.1.1
Review: https://review.fuel-infra.org/13808

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/ceilometer (openstack-ci/fuel-5.1-updates/2014.1.1)

Reviewed: https://review.fuel-infra.org/13809
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-5.1-updates/2014.1.1

Commit: 3d52f3002ba79ca0fd42139fd66bc1944ce4c6cf
Author: Nadya Shakhat <email address hidden>
Date: Wed Nov 11 11:39:19 2015

Have eventlet monkeypatch the time module

Without this, mongod retry logic in the various services started as
commands fails to behave as expected and does not reconnect as soon as
the mongod service has returned to availability.

In addition to the mongod sleep there are two other time.sleep calls
that may be reached by this change. Review and discussion with others
indicates that their behavior will continue to be correct with the
monkeypatch in place.

Cherry-pick from https://review.openstack.org/#/c/176751/
Closes-Bug: 1510916

Change-Id: I55b625fb5b817df45722abf6c38325b3c785fc34

Changed in mos:
milestone: 6.0-updates → 6.0-mu-7
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/ceilometer (openstack-ci/fuel-6.0-updates/2014.2)

Reviewed: https://review.fuel-infra.org/13639
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-6.0-updates/2014.2

Commit: f1706650385424f96683eac24ae4435aa510299f
Author: Nadya Shakhat <email address hidden>
Date: Wed Nov 11 19:07:42 2015

Have eventlet monkeypatch the time module

Without this, mongod retry logic in the various services started as
commands fails to behave as expected and does not reconnect as soon as
the mongod service has returned to availability.

In addition to the mongod sleep there are two other time.sleep calls
that may be reached by this change. Review and discussion with others
indicates that their behavior will continue to be correct with the
monkeypatch in place.

Cherry-pick from https://review.openstack.org/#/c/176751/
Closes-Bug: 1510916

Change-Id: I3cca717ab83a31ebbfb0b639e5d8390e59821505

Changed in mos:
status: Confirmed → Fix Committed
Revision history for this message
Vadim Rovachev (vrovachev) wrote :

Verified on 5.1.1.

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Returning this to the In Progress state, as the fix was wrongly commited to the 5.1-updates instead of 5.1.1-updates branch.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/ceilometer (openstack-ci/fuel-5.1.1-updates/2014.1.1)

Fix proposed to branch: openstack-ci/fuel-5.1.1-updates/2014.1.1
Change author: Nadya Shakhat <email address hidden>
Review: https://review.fuel-infra.org/14019

Revision history for this message
Nadya Privalova (nprivalova) wrote :

The fix was backported to 5.1.1-updates. See the previous comment.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/ceilometer (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/13640
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 198da766de4d39fabd1ba52a80e1dfa4aeb2f5d6
Author: Nadya Shakhat <email address hidden>
Date: Wed Nov 11 19:00:21 2015

Have eventlet monkeypatch the time module

Without this, mongod retry logic in the various services started as
commands fails to behave as expected and does not reconnect as soon as
the mongod service has returned to availability.

In addition to the mongod sleep there are two other time.sleep calls
that may be reached by this change. Review and discussion with others
indicates that their behavior will continue to be correct with the
monkeypatch in place.

Cherry-pick from https://review.openstack.org/#/c/176751/
Closes-Bug: 1510916

Change-Id: I899066be6f65b8c8a202e3d994c5b81533e5a5e4

Revision history for this message
Vitaly Gusev (vgusev) wrote :

Verified on Ubuntu 6.0.
Packages:
ceilometer-agent-central,ceilometer-agent-notification,ceilometer-alarm-evaluator,ceilometer-alarm-notifier,ceilometer-api,ceilometer-collector,ceilometer-common,python-ceilometer
Version:
2014.2-fuel6.0~mira25

Changed in mos:
status: Fix Committed → Fix Released
Roman Rufanov (rrufanov)
tags: added: support
tags: added: on-verification
Revision history for this message
Vitaly Gusev (vgusev) wrote :

Verified on Ubuntu 6.1
Packages:
ceilometer-agent-central,ceilometer-agent-notification,ceilometer-alarm-evaluator,ceilometer-alarm-notifier,ceilometer-api,ceilometer-collector,ceilometer-common,python-ceilometer
Version:
2014.2.2-1~u14.04+mos13

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.