OCF script misses RabbitMQ partitioning
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
High
|
Alexey Lebedeff |
Bug Description
Version: 9.0
RabbitMQ: 3.6.1-1~u14.04+mos3
RabbitMQ autoheal disabled
In an unstable network after some time RabbitMQ became partitioned, which is visible only through rabbitmqctl cluster_status: http://
The partition could be confirmed by running 'rabbitmqctl list_queues' - node-1 reports significantly more queues than node-189 and node-97.
At the same time
rabbitmqctl eval "mnesia:
reports that all nodes are online, so OCF script does not see that partition.
We need to teach RabbitMQ OCF script see such partitions.
Attached are results of 'rabbitmqctl list_queues messages consumers name' in files lst-*. Also attached RabbitMQ logs from the controllers.
Changed in fuel: | |
importance: | Undecided → High |
assignee: | nobody → MOS Oslo (mos-oslo) |
milestone: | none → 9.1 |
status: | New → Confirmed |
tags: | added: area-library |
Changed in fuel: | |
assignee: | MOS Oslo (mos-oslo) → Alexey Lebedeff (alebedev-a) |
Changed in fuel: | |
status: | Confirmed → Fix Committed |
tags: | added: on-verification |
Reviewed: https:/ /review. openstack. org/360484 /git.openstack. org/cgit/ openstack/ fuel-library/ commit/ ?id=fb5fe24e6e4 bf03d4e5e204cf3 78f51d019718c6
Committed: https:/
Submitter: Jenkins
Branch: stable/mitaka
commit fb5fe24e6e4bf03 d4e5e204cf378f5 1d019718c6
Author: Alexey Lebedeff <email address hidden>
Date: Thu Aug 25 15:13:36 2016 +0300
Perform rabbit partition checks from OCF script
Partitioned nodes are ordered to restart by master. It may sound like
`autoheal`, but the problem is that OCF script and `autoheal` are not
compatible because concepts of master in pacemaker and winner in
autoheal are completely unrelated.
Upsream change: https:/ /github. com/rabbitmq/ rabbitmq- server/ pull/939
Change-Id: I79bc2054a0ea0f 04917130779e338 0777960b1d6
Closes-Bug: 1616581