2015-10-28 14:04:59 |
Nadya Privalova |
description |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.0"
api: "1.0"
build_number: "58"
build_id: "2014-12-26_14-25-46"
astute_sha: "16b252d93be6aaa73030b8100cf8c5ca6a970a91"
fuellib_sha: "fde8ba5e11a1acaf819d402c645c731af450aff0"
ostf_sha: "a9afb68710d809570460c29d6c3293219d3624d4"
nailgun_sha: "5f91157daa6798ff522ca9f6d34e7e135f150a90"
fuelmain_sha: "81d38d6f2903b5a8b4bee79ca45a54b76c1361b8"
Steps to reproduce:
1. Deploy cluster with the following parameters:
3 controllers+mongo, KVM, 5 GB RAM
1 compute+ceph, Supermicro, 16 GB RAM
Sahara, Ceilometer enabled, Ceph for volumes, Ceph for images, Ceph for ephemeral volumes
2. Disable rabbitmq:
pcs resource disable master_p_rabbitmq-server
wait while master and slaves was stopped
3. Enable rabbitmq:
pcs resource enable master_p_rabbitmq-server
wait while master and slaves was started
Expected result:
Ceilometer collector successfully reconnected to rabbitmq
Actual result:
On all controller nodes in /var/log/ceilometer/ceilometer-collector.log we can see the following errors:
2015-10-28 11:43:01.113 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:02.115 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.0.3:5673
2015-10-28 11:43:02.123 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.3:5673 is unreachable: timed out. Trying again in 30 seconds.
2015-10-28 11:43:32.154 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:33.155 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 127.0.0.1:5673
2015-10-28 11:43:33.170 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.3:5673 closed the connection. Check login credentials: Socket closed
Metering queue in rabbit is not empty:
root@node-1:~# rabbitmqctl list_queues | grep metering
metering.sample 224
q-metering-plugin 0
q-metering-plugin.node-1 0
q-metering-plugin.node-2 0
q-metering-plugin.node-3 0
q-metering-plugin_fanout_1feb6540cda34d758611354495b98bfb 0
q-metering-plugin_fanout_64fb6a3997c44ecea91bbf617a8920d5 0
q-metering-plugin_fanout_bae52586cac2478b960d851b690f1494 0
After ~10 minutes collector on one controller reconnects to rabbitmq, but collectors on other two controllers don't.
For example, all ceilometer-agent-notifications succesfully reconnect to rabbitmq after rabbit restart.
Workaround: restart ceilometer-collector, after this collector succesfully connects to rabbitmq. |
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.0"
api: "1.0"
build_number: "58"
build_id: "2014-12-26_14-25-46"
astute_sha: "16b252d93be6aaa73030b8100cf8c5ca6a970a91"
fuellib_sha: "fde8ba5e11a1acaf819d402c645c731af450aff0"
ostf_sha: "a9afb68710d809570460c29d6c3293219d3624d4"
nailgun_sha: "5f91157daa6798ff522ca9f6d34e7e135f150a90"
fuelmain_sha: "81d38d6f2903b5a8b4bee79ca45a54b76c1361b8"
oslo.messaging version:
at 6.0 env:
root@node-2:~# apt-cache policy python-oslo.messaging
python-oslo.messaging:
Installed: 1.4.1-fuel6.0~mira18
at 6.1 env (Verizon):
python-oslo.messaging_1.4.1-1~u14.04+mos13_all.deb
Steps to reproduce:
1. Deploy cluster with the following parameters:
3 controllers+mongo, KVM, 5 GB RAM
1 compute+ceph, Supermicro, 16 GB RAM
Sahara, Ceilometer enabled, Ceph for volumes, Ceph for images, Ceph for ephemeral volumes
2. Disable rabbitmq:
pcs resource disable master_p_rabbitmq-server
wait while master and slaves was stopped
3. Enable rabbitmq:
pcs resource enable master_p_rabbitmq-server
wait while master and slaves was started
Expected result:
Ceilometer collector successfully reconnected to rabbitmq
Actual result:
On all controller nodes in /var/log/ceilometer/ceilometer-collector.log we can see the following errors:
2015-10-28 11:43:01.113 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:02.115 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.0.3:5673
2015-10-28 11:43:02.123 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.3:5673 is unreachable: timed out. Trying again in 30 seconds.
2015-10-28 11:43:32.154 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:33.155 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 127.0.0.1:5673
2015-10-28 11:43:33.170 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.3:5673 closed the connection. Check login credentials: Socket closed
Metering queue in rabbit is not empty:
root@node-1:~# rabbitmqctl list_queues | grep metering
metering.sample 224
q-metering-plugin 0
q-metering-plugin.node-1 0
q-metering-plugin.node-2 0
q-metering-plugin.node-3 0
q-metering-plugin_fanout_1feb6540cda34d758611354495b98bfb 0
q-metering-plugin_fanout_64fb6a3997c44ecea91bbf617a8920d5 0
q-metering-plugin_fanout_bae52586cac2478b960d851b690f1494 0
After ~10 minutes collector on one controller reconnects to rabbitmq, but collectors on other two controllers don't.
For example, all ceilometer-agent-notifications succesfully reconnect to rabbitmq after rabbit restart.
Workaround: restart ceilometer-collector, after this collector succesfully connects to rabbitmq. |
|