Fuel for OpenStack

system test change_pacemaker_parameter_not_break_rabbitmq failed with TimeoutError: All nodes are staying in the cluster

Bug #1591261 reported by Artem Hrechanychenko on 2016-06-10

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Released	High	Andrey Sledzinskiy	Fuel for OpenStack 10.0
Mitaka	Fix Released	High	Andrey Sledzinskiy	Fuel for OpenStack 9.1
Newton	Fix Released	High	Andrey Sledzinskiy	Fuel for OpenStack 10.0

Bug Description

Detailed bug description:
  Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/usr/lib/python2.7/unittest/case.py", line 1043, in runTest
    self._testFunc()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ha_neutron_destructive/fuelweb_test/helpers/decorators.py", line 120, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ha_neutron_destructive/fuelweb_test/tests/tests_strength/test_failover.py", line 353, in change_pacemaker_parameter_not_break_rabbitmq
    super(self.__class__, self). \
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ha_neutron_destructive/fuelweb_test/tests/tests_strength/test_failover_base.py", line 1296, in change_pacemaker_parameter_not_break_rabbitmq
    timeout_msg='All nodes are staying in the cluster')
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/helpers/helpers.py", line 100, in wait
    raise TimeoutError(timeout_msg)
TimeoutError: All nodes are staying in the cluster

from logs we see:
>>No running rabbitmq nodes found on slave-03

>>Cluster status of node 'rabbit@messaging-node-1' ...
[{nodes,[{disc,['rabbit@messaging-node-1','rabbit@messaging-node-2','rabbit@messaging-node-3']}]},{alarms,[{'rabbit@messaging-node-2',[]},{'rabbit@messaging-node-3',[]}]}]

>>Unexpected exit_code returned: actual 143, expected 0 69 70 75. Command: 'rabbitmqctl cluster_status' Details:
Host: 10.109.5.5
Command: 'rabbitmqctl cluster_status'
Exit code: 143

we see that rabbitmq cluster not in available state after reverting snapshot with pre-deployed cluster with 3 controllers

Steps to reproduce:
            1. Deploy environment with at least 3 controllers <<<<failed here
            2. Change max_rabbitmqctl_timeouts parameter on one of
               controllers,after that slaves rabbitmq will be restarted by
               Pacemaker.
            3. Wait for 3 minutes.
            4. Check RabbitMQ cluster is assembled until success in 10 min
            5. Run OSTF
            6. Repeat two more times steps 2-5

Expected results:
rabbitmq cluster available after reverting env prepeared with prepare_ha_neutron test

Actual result:
TimeoutError: All nodes are staying in the cluster
Reproducibility:
https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ha_neutron_destructive/137/consoleFull

Workaround:
-
Impact:
swarm
Description of the environment:
cat /etc/fuel_build_id:
465
cat /etc/fuel_build_number:
465
cat /etc/fuel_release:
9.0
cat /etc/fuel_openstack_version:
mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
fuel-release-9.0.0-1.mos6349.noarch
fuel-misc-9.0.0-1.mos8454.noarch
python-packetary-9.0.0-1.mos140.noarch
fuel-bootstrap-cli-9.0.0-1.mos285.noarch
fuel-migrate-9.0.0-1.mos8454.noarch
rubygem-astute-9.0.0-1.mos750.noarch
fuel-mirror-9.0.0-1.mos140.noarch
shotgun-9.0.0-1.mos90.noarch
fuel-openstack-metadata-9.0.0-1.mos8742.noarch
fuel-notify-9.0.0-1.mos8454.noarch
nailgun-mcagents-9.0.0-1.mos750.noarch
python-fuelclient-9.0.0-1.mos325.noarch
fuel-9.0.0-1.mos6349.noarch
fuel-utils-9.0.0-1.mos8454.noarch
fuel-setup-9.0.0-1.mos6349.noarch
fuel-provisioning-scripts-9.0.0-1.mos8742.noarch
fuel-library9.0-9.0.0-1.mos8454.noarch
network-checker-9.0.0-1.mos74.x86_64
fuel-agent-9.0.0-1.mos285.noarch
fuel-ui-9.0.0-1.mos2717.noarch
fuel-ostf-9.0.0-1.mos935.noarch
fuelmenu-9.0.0-1.mos274.noarch
fuel-nailgun-9.0.0-1.mos8742.noarch
Additional information:

Tags:

Revision history for this message

Artem Hrechanychenko (agrechanichenko) wrote on 2016-06-10:

fail_error_change_pacemaker_parameter_not_break_rabbitmq-fuel-snapshot-2016-06-10_06-36-40.tar.gz Edit (51.5 MiB, application/x-tar)

Artem Hrechanychenko (agrechanichenko) on 2016-06-10

tags:

added: area-qa
removed: high

Aleksey Zvyagintsev (azvyagintsev) on 2016-06-11

Changed in fuel:
status:	New → Confirmed

Andrey Sledzinskiy (asledzinskiy) on 2016-06-13

Changed in fuel:
assignee:	Fuel QA Team (fuel-qa) → Andrey Sledzinskiy (asledzinskiy)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-14: Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/329319

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-14: Fix proposed to fuel-qa (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/329389

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-14: Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/329319
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=fdf57d26289928730c7135d3b411b4c9bacb8558
Submitter: Jenkins
Branch: master

commit fdf57d26289928730c7135d3b411b4c9bacb8558
Author: asledzinskiy <email address hidden>
Date: Tue Jun 14 11:33:17 2016 +0300

Fix rabbit nodes count after pacemaker changes

- When we change pacemaker parameter all nodes are leaving
cluster so expected number of rabbit nodes is 0

Change-Id: I3bf19c2de1eaaa9b1cd1a526fd11ad5773c2d529
Closes-Bug: #1591261

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-14: Fix merged to fuel-qa (stable/mitaka)

Reviewed: https://review.openstack.org/329389
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=4181d3cfcefa313e2b7c9435ca10be847e72052c
Submitter: Jenkins
Branch: stable/mitaka

commit 4181d3cfcefa313e2b7c9435ca10be847e72052c
Author: asledzinskiy <email address hidden>
Date: Tue Jun 14 11:33:17 2016 +0300

Fix rabbit nodes count after pacemaker changes

- When we change pacemaker parameter all nodes are leaving
cluster so expected number of rabbit nodes is 0

Change-Id: I3bf19c2de1eaaa9b1cd1a526fd11ad5773c2d529
Closes-Bug: #1591261

Revision history for this message

Nastya Urlapova (aurlapova) wrote on 2016-06-21:

https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ha_neutron_destructive/148/testReport/(root)/change_pacemaker_parameter_not_break_rabbitmq/change_pacemaker_parameter_not_break_rabbitmq/ there is issue with ostf, original one was fixed

Revision history for this message

Andrey Lavrentyev (alavrentyev) wrote on 2016-09-13:

Looks like it got regression issue or the bug hasn't fixed completely.

Failure on Swarm: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive/57/testReport/%28root%29/change_pacemaker_parameter_not_break_rabbitmq/

Version info:

9.1 snapshot #251

UBUNTU_MIRROR_ID=ubuntu-2016-08-03-174238
CENTOS_MIRROR_ID=centos-7.2.1511-2016-05-31-083834
MOS_UBUNTU_MIRROR_ID=9.0-2016-09-11-182323
MOS_CENTOS_OS_MIRROR_ID=os-2016-06-23-135731
MOS_CENTOS_PROPOSED_MIRROR_ID=proposed-2016-09-11-232321
MOS_CENTOS_UPDATES_MIRROR_ID=updates-2016-06-23-135916
MOS_CENTOS_HOLDBACK_MIRROR_ID=holdback-2016-06-23-140047
MOS_CENTOS_HOTFIX_MIRROR_ID=hotfix-2016-07-18-162958
MOS_CENTOS_SECURITY_MIRROR_ID=security-2016-06-23-140002