OCF script misses RabbitMQ partitioning

Bug #1616581 reported by Dmitry Mescheryakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Alexey Lebedeff

Bug Description

Version: 9.0
RabbitMQ: 3.6.1-1~u14.04+mos3
RabbitMQ autoheal disabled

In an unstable network after some time RabbitMQ became partitioned, which is visible only through rabbitmqctl cluster_status: http://paste.openstack.org/show/563110/
The partition could be confirmed by running 'rabbitmqctl list_queues' - node-1 reports significantly more queues than node-189 and node-97.

At the same time
rabbitmqctl eval "mnesia:system_info(running_db_nodes)."
reports that all nodes are online, so OCF script does not see that partition.

We need to teach RabbitMQ OCF script see such partitions.

Attached are results of 'rabbitmqctl list_queues messages consumers name' in files lst-*. Also attached RabbitMQ logs from the controllers.

Changed in fuel:
importance: Undecided → High
assignee: nobody → MOS Oslo (mos-oslo)
milestone: none → 9.1
status: New → Confirmed
tags: added: area-library
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
description: updated
Changed in fuel:
assignee: MOS Oslo (mos-oslo) → Alexey Lebedeff (alebedev-a)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/360484
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=fb5fe24e6e4bf03d4e5e204cf378f51d019718c6
Submitter: Jenkins
Branch: stable/mitaka

commit fb5fe24e6e4bf03d4e5e204cf378f51d019718c6
Author: Alexey Lebedeff <email address hidden>
Date: Thu Aug 25 15:13:36 2016 +0300

    Perform rabbit partition checks from OCF script

    Partitioned nodes are ordered to restart by master. It may sound like
    `autoheal`, but the problem is that OCF script and `autoheal` are not
    compatible because concepts of master in pacemaker and winner in
    autoheal are completely unrelated.

    Upsream change: https://github.com/rabbitmq/rabbitmq-server/pull/939

    Change-Id: I79bc2054a0ea0f04917130779e3380777960b1d6
    Closes-Bug: 1616581

tags: added: in-stable-mitaka
Changed in fuel:
status: Confirmed → Fix Committed
Anna Babich (ababich)
tags: added: on-verification
Revision history for this message
Anna Babich (ababich) wrote :

Verified on MOS 9.1, SNAPSHOT_ID=#209

The required patch is put to /usr/lib/ocf/resource.d/fuel/rabbitmq-server-upstream on controllers

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.