[System tests] Need to add tests to check rabbit node kick in case corosync node dies

Bug #1443827 reported by Andrey Sledzinskiy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Andrey Sledzinskiy
6.0.x
Invalid
Undecided
Unassigned

Bug Description

We need to cover this case https://bugs.launchpad.net/fuel/+bug/1437348/
Steps are:
0) deploy any HA environment with 3 controllers;
at some controller node issue "pcs resource unmanage master_p_rabbitmq-server"

should not kick alive nodes:
1) at the 1st controller, for example node-1, stop corosync service gracefully
2) at master node check the /var/log/remote/node-*/rabbit-fence.log:
* it should contain info like:
"Got node-1.test.domain.local that left cluster
...
Preparing to fence node rabbit@node-1 from rabbit cluster
... (within a 1 minute) ...
Ignoring alive node rabbit@node-1"
3) at other (not the node-1, where corosync was stopped) controllers check rabbitmq cluster_status:
* it should contain all 3 rabbit nodes running and mentioned as cluster members
4) teardown:
* start stopped corosync service; restart pacemaker service at the same node
* pcs status should show all 3 nodes online within a 1 minute

should kick failed rabbit node only once:
5) at the 1st controller, for example node-1, issue rabbitmqctl stop_app; and stop
corosync service gracefully
6) at master node check the /var/log/remote/node-*/rabbit-fence.log:
* some of the controller node's log should contain info like:
"Got node-1.test.domain.local that left cluster
...
Preparing to fence node rabbit@node-1 from rabbit cluster
... (within a 1 minute) ...
Disconnecting rabbit@node-1
Forgetting cluster node rabbit@node-1"
3) at other (not the node-1, where corosync was stopped) controllers check rabbitmq cluster_status:
* it should contain only 2 rabbit nodes running and mentioned as cluster members (the node-1 should not be listed there)

Tags: system-tests
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/173260

Changed in fuel:
status: New → In Progress
Changed in fuel:
assignee: Andrey Sledzinskiy (asledzinskiy) → Nastya Urlapova (aurlapova)
Changed in fuel:
assignee: Nastya Urlapova (aurlapova) → Andrey Sledzinskiy (asledzinskiy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/173260
Committed: https://git.openstack.org/cgit/stackforge/fuel-qa/commit/?id=03c12079c5908bd125872c37efe77d29ed2c20aa
Submitter: Jenkins
Branch: master

commit 03c12079c5908bd125872c37efe77d29ed2c20aa
Author: asledzinskiy <email address hidden>
Date: Mon Apr 13 16:45:38 2015 +0300

    Add corosync failover tests

    - Add test to check that in case of corosync node dies
    alive rabbit node isn't kicked from cluster
    - Add test to check that in case of corosync node dies
    dead rabbit node is kicked from cluster

    Change-Id: I48b1d80dc81b453d30ca1c66c97b94d215422015
    Closes-Bug: #1443827

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.