Rabbitmq OCF script requires additional criteria to be met for Master/Slave statuses
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Bogdan Dobrelya | ||
5.1.x |
Fix Committed
|
High
|
Bogdan Dobrelya | ||
6.0.x |
Fix Committed
|
High
|
Bogdan Dobrelya | ||
6.1.x |
Fix Committed
|
High
|
Bogdan Dobrelya |
Bug Description
The build of http://
The other floating issues with rabbitmq clustering are:
* forget_cluster_node command could take a lot of
the time (and even out of the time given to post-stop notify event)
if rabbit node is under heavy load.
* It is also possible that all
rabbitmq resources could persist as a slaves and there won't be
any master elected (see the dubplicating bug/1401956).
* Sometimes, join_cluster could take quite a long of a time. If it exceeded, the node will enter into join-wait-reset loop for ever.
In order to fix it, we should:
- additionally check if 'rabbitmqctl list_users' does not return an error at the given node, and only then report the Master or Slave of multistate clone as running. Otherwise it should report Stopped state.
- wrap rabbitmqctl commands to timeout with -KILL signal
- use disconnect_node prior to issuing forget_cluster_node
- thoroughly re-examine the OCF script logic and fix it (see the commit message for a related patch below)
Changed in fuel: | |
importance: | Undecided → High |
status: | New → Triaged |
milestone: | none → 6.0 |
summary: |
- Rabbitmq OCF script requires additional criterias to be met for + Rabbitmq OCF script requires additional criteria to be met for Master/Slave statuses |
description: | updated |
description: | updated |
Related https:/ /bugs.launchpad .net/fuel/ +bug/1339080