Mirantis OpenStack

RabbitMQ cluster fault after pacemeker OCF parameter changed

Bug #1546286 reported by Andrii Petrenko on 2016-02-16

This bug report is a duplicate of: Bug #1496386: Pacemaker tries to start rabbit eternally. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	New	High	Unassigned
	6.1.x	New	Critical	Unassigned	Mirantis OpenStack 6.1-mu-5

Bug Description

After update OCF parameters of rabbitmq cluster by command "crm configure edit p_rabbitmq-server"

from:

params node_port=5673 debug=false command_timeout="--signal=KILL" erlang_cookie=EOKOWXQREETZSHFNTPEY max_rabbitmqctl_timeouts=3 \

to:

params node_port=5673 debug=false command_timeout="--signal=KILL" erlang_cookie=EOKOWXQREETZSHFNTPEY max_rabbitmqctl_timeouts=5 \

Slaves of rabbitmq cluster has been shutdown by pacemaker and newer get up.

How to reproduce:

1. install environment with HA mode (3 controllers)
2. start rabbitmq service by pacemaker.
3. make sure that cluster in proper state: command "rabbitmqctl cluster_status" shows 3 nodes in cluster

root@node-1:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
{running_nodes,['rabbit@node-2','rabbit@node-3','rabbit@node-1']},
{cluster_name,<<"<email address hidden>">>},
{partitions,[]}]
...done.

4. run "crm configure edit p_rabbitmq-server" on controller
5. change any value.
6. save config and exit from editor
7. in a minute check rabbbitmq cluster status using "rabbitmqctl cluster_status"

root@node-1:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
{running_nodes,['rabbit@node-1']},
{cluster_name,<<"<email address hidden>">>},
{partitions,[]}]
...done.

8. in a 2 minutes check rabbitmq cluster status by command: pcs status

Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
p_rabbitmq-server (ocf::fuel:rabbitmq-server): FAILED node-2.domain.local
p_rabbitmq-server (ocf::fuel:rabbitmq-server): FAILED node-3.domain.local

and then

Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
Masters: [ node-1.domain.local ]
Stopped: [ node-2.domain.local node-3.domain.local ]

after restart

Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
Masters: [ node-1.domain.local ]
Slaves: [ node-2.domain.local node-3.domain.local ]

Tags:

Andrii Petrenko (aplsms) on 2016-02-16

tags:	added: customer-found support
Changed in mos:
importance:	Undecided → High

Andrii Petrenko (aplsms) on 2016-02-16

no longer affects:

mos/7.0.x

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1496386 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.