Make the rabbitmq OCF to check ha sync for graceful stop, demote actions

Bug #1464637 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Matthew Mosesohn

Bug Description

In case of a graceful action demote of a master or a slave stopping, it could be nice to ensure the queues owned by this node are fully synchronized elsewhere. Otherwise, there could be data loss.

The graceful cases are when the operator stops or demotes the resource
by request.

Changed in fuel:
importance: Undecided → Medium
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 7.0
tags: added: rabbitmq
description: updated
Changed in fuel:
status: New → Confirmed
Changed in fuel:
importance: Medium → Wishlist
milestone: 7.0 → 8.0
Changed in fuel:
milestone: 8.0 → 7.0
status: Confirmed → Won't Fix
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

From Bogdan:
This task may be done by analyzing rabbitmqctl list queues slave_pids syncronized_slave_pids and waiting for sync until queues are all synced. The loop or method should be called wait_for_all_queues_in_sync. The loop should run until 50% of timeout interval for stop or demote action, defined by OCF_RESKEY_CRM_meta_timeout.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/233019

Changed in fuel:
status: Won't Fix → In Progress
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/8.0.x
Changed in fuel:
milestone: 7.0 → 8.0
importance: Wishlist → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/233019
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=9ca5b6709cd18c8134cd8feaccf937ef49a625b4
Submitter: Jenkins
Branch: master

commit 9ca5b6709cd18c8134cd8feaccf937ef49a625b4
Author: Matthew Mosesohn <email address hidden>
Date: Mon Oct 5 15:48:33 2015 +0300

    Wait for rabbitmq sync before stop/demote actions

    Added new OCF key stop_time (corresponding to start_time)
    Added wait_sync function which tries until start_time/2
    for queues on stopped/demoted node to reach synced state.

    Added optional [-t timeout] to su_rabbit_cmd function to
    provide arbitrary timeout

    Change-Id: Iae2211b3d477a9603a58d5eacb12e0fba924861a
    Closes-Bug: #1464637

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/235422

Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/235422
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=0204868a14ac0aa310fc135135a88b6aec576d8c
Submitter: Jenkins
Branch: master

commit 0204868a14ac0aa310fc135135a88b6aec576d8c
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Oct 15 17:15:30 2015 +0200

    Fix the timeout arg for the su_rabbit_cmd

    And fix local bashisms as a little bonus
    Upstream patch https://github.com/rabbitmq/rabbitmq-server/pull/374

    Related-bug: #1464637

    Change-Id: I13189de9f8abce23673c031d11132e495e1972e3
    Signed-off-by: Bogdan Dobrelya <email address hidden>

tags: added: on-verification
Revision history for this message
ElenaRossokhina (esolomina) wrote :

Steps for verification/reproducing are unclear
@bogdando, could you please update description with additional information how fix can be checked?

Changed in fuel:
status: Fix Committed → Incomplete
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please do not move to incomplete bugs with commits submitted. Such bags may become invalid by timeouts, while fixed in fact

Changed in fuel:
status: Incomplete → Fix Committed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The test case:
1) For a running rabbit resource master node-X, issue demote and stop actions as:

a) "OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/rabbitmq/rabbitmq-server-ha demote"

- It is expected to get log records to the console like these below, and in the following order:

demote: action begin
demote: waiting.*to sync
demote: Execute stop_app with timeout
demote: action end

b) next run
"OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/rabbitmq/rabbitmq-server-ha stop"

- It is expected to get log records to the console like these below, and in the following order:

stop: action begin
stop: waiting.*to sync
stop_server_process(): Execute stop with timeout
stop: action end

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

570 verified

tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.