Comment 0 for bug 1510916

Revision history for this message
Vitaly Gusev (vgusev) wrote :

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.0"
  api: "1.0"
  build_number: "58"
  build_id: "2014-12-26_14-25-46"
  astute_sha: "16b252d93be6aaa73030b8100cf8c5ca6a970a91"
  fuellib_sha: "fde8ba5e11a1acaf819d402c645c731af450aff0"
  ostf_sha: "a9afb68710d809570460c29d6c3293219d3624d4"
  nailgun_sha: "5f91157daa6798ff522ca9f6d34e7e135f150a90"
  fuelmain_sha: "81d38d6f2903b5a8b4bee79ca45a54b76c1361b8"

Steps to reproduce:
1. Deploy cluster with the following parameters:
    3 controllers+mongo, KVM, 5 GB RAM
    1 compute+ceph, Supermicro, 16 GB RAM
Sahara, Ceilometer enabled, Ceph for volumes, Ceph for images, Ceph for ephemeral volumes

2. Disable rabbitmq:
pcs resource disable master_p_rabbitmq-server
wait while master and slaves was stopped

3. Enable rabbitmq:
pcs resource enable master_p_rabbitmq-server
wait while master and slaves was started

Expected result:
Ceilometer collector successfully reconnected to rabbitmq

Actual result:
On all controller nodes in /var/log/ceilometer/ceilometer-collector.log we can see the following errors:

2015-10-28 11:43:01.113 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:02.115 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.0.3:5673
2015-10-28 11:43:02.123 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.3:5673 is unreachable: timed out. Trying again in 30 seconds.
2015-10-28 11:43:32.154 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:33.155 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 127.0.0.1:5673
2015-10-28 11:43:33.170 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.3:5673 closed the connection. Check login credentials: Socket closed

Metering queue in rabbit is not empty:
root@node-1:~# rabbitmqctl list_queues | grep metering
metering.sample 224
q-metering-plugin 0
q-metering-plugin.node-1 0
q-metering-plugin.node-2 0
q-metering-plugin.node-3 0
q-metering-plugin_fanout_1feb6540cda34d758611354495b98bfb 0
q-metering-plugin_fanout_64fb6a3997c44ecea91bbf617a8920d5 0
q-metering-plugin_fanout_bae52586cac2478b960d851b690f1494 0

After ~10 minutes collector on one controller reconnects to rabbitmq, but collectors on other two controllers don't.
For example, all ceilometer-agent-notifications succesfully reconnect to rabbitmq after rabbit restart.

Workaround: restart ceilometer-collector, after this collector succesfully connects to rabbitmq.