After restart of 'rabbitmq-server' service on all controllers, previously generated messages were lost
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Mirantis OpenStack | Status tracked in 10.0.x | |||||
10.0.x |
Invalid
|
High
|
Dmitry Mescheryakov |
Bug Description
Hello,
Please take a look at the following issue:
After restart of 'rabbitmq-server' service on all controllers, messages, generated with 'oslo.messaging
Note: I am not restarting service on ALL controllers at ONCE.
My actions are:
- Restart rabbit service on one controller;
- Wait till it'll be up and running;
- Wait till this controller will present in cluster;
- Perform the same actions on next controller.
My env is MOS 9.0 (ISO: fuel-9.
With 3x controller, 1x compute-cinder.
Actions performed from controller(s):
1) OK - Install oslo.messaging-
# apt-get update
# apt-get install git python-pip python-dev -y
# cd /root/
# git clone https:/
# cd /root/oslo.
# pip install -r requirements.txt -r test-requiremen
# dpkg -i oslo.messaging-
# apt-get -f install -y
2) OK - Get nodes inside RabbitMQ cluster:
# rabbitmqctl cluster_status | grep -A1 'running_nodes'
{running_
3) OK - Get IPs of nodes:
# getent hosts node-1 --> 10.109.1.4 node-1.
# getent hosts node-2 --> 10.109.1.7 node-2.
# getent hosts node-4 --> 10.109.1.6 node-4.
4) OK - Fill oslo config file:
# cat /root/oslo.
[DEFAULT]
debug=true
[oslo_
rabbit_hosts = 10.109.1.4:5673, 10.109.1.6:5673, 10.109.1.7:5673
rabbit_userid = nova
rabbit_password = Ajl9OxOMW2mgB7f
5) OK - Generate and consume 10000 messages without rabbitmq-server restart:
# cd /root/oslo.
# oslo_msg_
# oslo_msg_
>>> OK - Consumed 10000 messages
6) OK - Generate and consume 10000 messages WITH rabbitmq-server restart on ONE controller:
# cd /root/oslo.
# oslo_msg_
# service rabbitmq-server restart && sleep 10
# rabbitmqctl cluster_status | grep -A1 'running_nodes' #\\ check that all 3 nodes present
# oslo_msg_
>>> OK - Consumed 10000 messages
6) NOK - Generate and consume 10000 messages WITH rabbitmq-server restart on ALL controller:
On first controller:
# cd /root/oslo.
# oslo_msg_
Perform restart on each controller one-by-one and check cluster status after it:
# rabbitmqctl list_queues slave_pids name | grep `hostname` | wc -l #\\ Remember number before restart. (For me: 20)
# service rabbitmq-server restart && sleep 10
# rabbitmqctl cluster_status | grep -A1 'running_nodes' #\\ check that all 3 nodes present
# rabbitmqctl list_queues slave_pids name | grep `hostname` | wc -l #\\ Check that num is the same after restart. (For me: 20)
After that perform the same commands above on next controller.
On first controller:
# oslo_msg_
>>> NOK - Consumed 0 messages \\\ Should be 10000
Re-try on first controller after 5 minutes:
# oslo_msg_
>>> NOK - Consumed 0 messages \\\ Should be 10000
root@node-2:~# rabbitmqctl list_policies
Listing policies ...
/ ha-notif all ^(event|
/ heat_rpc_expire all ^heat-engine-
/ tasks_expire all ^tasks\\. {"expires":3600000} 1
/ results_expire all ^results\\. {"expires":3600000} 1
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: | added: swarm-blocker |
Changed in mos: | |
status: | Confirmed → In Progress |
tags: | added: 10.0-reviewed |
Alexander, it's as designed and expected behaviour. MOS does not support dumping of queued messages in case if ALL RabbitMQ services are restarted at once. Services are restarted one by one and there is enough time for RabbitMQ sync, it will be ok.