RabbitMQ OCF script can not handle certain cluster partitions

Bug #1628487 reported by Dmitry Mescheryakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
High
Alexey Lebedeff
Nominated for Ocata by Sergii Golovatiuk
Mitaka
Fix Released
High
Alexey Lebedeff
Newton
Confirmed
High
Alexey Lebedeff

Bug Description

Version: 9.1

1. Deploy an OpenStack environment consisting of 5 nodes and stress it for some time with Jepsen. Details on how to run Jepsen in Fuel environment could be found here https://github.com/bogdando/jepsen/tree/fuel/noop in section "How-to run tests from the Fuel master as a Jepsen control node".

After several runs RabbitMQ cluster becomes partitioned, but OCF script ignores that. It looks like that:

root@node-1:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@messaging-node-1' ...
...
 {partitions,
     [{'rabbit@messaging-node-1',
          ['rabbit@messaging-node-2','rabbit@messaging-node-4']}]},
...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/378600

Changed in fuel:
assignee: nobody → Alexey Lebedeff (alebedev-a)
Changed in fuel:
importance: Undecided → High
status: New → In Progress
milestone: none → 9.1
Roman Vyalov (r0mikiam)
Changed in fuel:
status: In Progress → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/378600
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=852ef73b0b079c977924b4a701575efa01171080
Submitter: Jenkins
Branch: stable/mitaka

commit 852ef73b0b079c977924b4a701575efa01171080
Author: Alexey Lebedeff <email address hidden>
Date: Thu Sep 29 16:17:43 2016 +0300

    OCF RA: Check partitions on non-master nodes

    Upstream PR: https://github.com/rabbitmq/rabbitmq-server-release/pull/1

    Partitions reported by `rabbit_node_monitor:partitions/0` are not
    commutative (i.e. node1 can report itself as partitioned with node2, but
    not vice versa).

    Given that we now have strong notion of master in OCF script, we can
    check for those fishy situations during master health check, and order
    damaged nodes to restart.

    Change-Id: I80a920725575c36d30726e519a95835fd5982ce0
    Closes-Bug: 1628487

tags: added: on-verification
Revision history for this message
Ekaterina Shutova (eshutova) wrote :

No cluster partition observed. Rabbit is ok.
Verified on:
cat /etc/fuel_build_id:
 495
cat /etc/fuel_build_number:
 495
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-9.0.0-1.mos6359.noarch
 fuel-setup-9.0.0-1.mos6359.noarch
 fuel-release-9.0.0-1.mos6359.noarch
 python-packetary-9.0.0-1.mos157.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-utils-9.0.0-1.mos8641.noarch
 fuel-ostf-9.0.0-1.mos947.noarch
 fuel-migrate-9.0.0-1.mos8641.noarch
 fuel-nailgun-9.0.0-1.mos8908.noarch
 fuel-agent-9.0.0-1.mos291.noarch
 fuel-misc-9.0.0-1.mos8641.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8908.noarch
 fuel-bootstrap-cli-9.0.0-1.mos291.noarch
 fuel-notify-9.0.0-1.mos8641.noarch
 rubygem-astute-9.0.0-1.mos782.noarch
 fuelmenu-9.0.0-1.mos276.noarch
 fuel-library9.0-9.0.0-1.mos8641.noarch
 nailgun-mcagents-9.0.0-1.mos782.noarch
 python-fuelclient-9.0.0-1.mos363.noarch
 network-checker-9.0.0-1.mos77.x86_64
 fuel-ui-9.0.0-1.mos2840.noarch
 fuel-mirror-9.0.0-1.mos157.noarch
 fuel-openstack-metadata-9.0.0-1.mos8908.noarch

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.