system test change_pacemaker_parameter_not_break_rabbitmq failed with TimeoutError: All nodes are staying in the cluster

Bug #1591261 reported by Artem Hrechanychenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Andrey Sledzinskiy
Mitaka
Fix Released
High
Andrey Sledzinskiy
Newton
Fix Released
High
Andrey Sledzinskiy

Bug Description

Detailed bug description:
  Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/usr/lib/python2.7/unittest/case.py", line 1043, in runTest
    self._testFunc()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ha_neutron_destructive/fuelweb_test/helpers/decorators.py", line 120, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ha_neutron_destructive/fuelweb_test/tests/tests_strength/test_failover.py", line 353, in change_pacemaker_parameter_not_break_rabbitmq
    super(self.__class__, self). \
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.ha_neutron_destructive/fuelweb_test/tests/tests_strength/test_failover_base.py", line 1296, in change_pacemaker_parameter_not_break_rabbitmq
    timeout_msg='All nodes are staying in the cluster')
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/devops/helpers/helpers.py", line 100, in wait
    raise TimeoutError(timeout_msg)
TimeoutError: All nodes are staying in the cluster

from logs we see:
>>No running rabbitmq nodes found on slave-03

>>Cluster status of node 'rabbit@messaging-node-1' ...
[{nodes,[{disc,['rabbit@messaging-node-1','rabbit@messaging-node-2','rabbit@messaging-node-3']}]},{alarms,[{'rabbit@messaging-node-2',[]},{'rabbit@messaging-node-3',[]}]}]

>>Unexpected exit_code returned: actual 143, expected 0 69 70 75. Command: 'rabbitmqctl cluster_status' Details:
Host: 10.109.5.5
Command: 'rabbitmqctl cluster_status'
Exit code: 143

we see that rabbitmq cluster not in available state after reverting snapshot with pre-deployed cluster with 3 controllers

Steps to reproduce:
            1. Deploy environment with at least 3 controllers <<<<failed here
            2. Change max_rabbitmqctl_timeouts parameter on one of
               controllers,after that slaves rabbitmq will be restarted by
               Pacemaker.
            3. Wait for 3 minutes.
            4. Check RabbitMQ cluster is assembled until success in 10 min
            5. Run OSTF
            6. Repeat two more times steps 2-5

Expected results:
  rabbitmq cluster available after reverting env prepeared with prepare_ha_neutron test

Actual result:
    TimeoutError: All nodes are staying in the cluster
Reproducibility:
    https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ha_neutron_destructive/137/consoleFull

Workaround:
 -
Impact:
 swarm
Description of the environment:
   cat /etc/fuel_build_id:
 465
cat /etc/fuel_build_number:
 465
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6349.noarch
 fuel-misc-9.0.0-1.mos8454.noarch
 python-packetary-9.0.0-1.mos140.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 fuel-migrate-9.0.0-1.mos8454.noarch
 rubygem-astute-9.0.0-1.mos750.noarch
 fuel-mirror-9.0.0-1.mos140.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-openstack-metadata-9.0.0-1.mos8742.noarch
 fuel-notify-9.0.0-1.mos8454.noarch
 nailgun-mcagents-9.0.0-1.mos750.noarch
 python-fuelclient-9.0.0-1.mos325.noarch
 fuel-9.0.0-1.mos6349.noarch
 fuel-utils-9.0.0-1.mos8454.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8742.noarch
 fuel-library9.0-9.0.0-1.mos8454.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-ui-9.0.0-1.mos2717.noarch
 fuel-ostf-9.0.0-1.mos935.noarch
 fuelmenu-9.0.0-1.mos274.noarch
 fuel-nailgun-9.0.0-1.mos8742.noarch
Additional information:

Tags: area-qa
Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :
tags: added: area-qa
removed: high
Changed in fuel:
status: New → Confirmed
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Andrey Sledzinskiy (asledzinskiy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/329319

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/329389

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/329319
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=fdf57d26289928730c7135d3b411b4c9bacb8558
Submitter: Jenkins
Branch: master

commit fdf57d26289928730c7135d3b411b4c9bacb8558
Author: asledzinskiy <email address hidden>
Date: Tue Jun 14 11:33:17 2016 +0300

    Fix rabbit nodes count after pacemaker changes

    - When we change pacemaker parameter all nodes are leaving
    cluster so expected number of rabbit nodes is 0

    Change-Id: I3bf19c2de1eaaa9b1cd1a526fd11ad5773c2d529
    Closes-Bug: #1591261

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (stable/mitaka)

Reviewed: https://review.openstack.org/329389
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=4181d3cfcefa313e2b7c9435ca10be847e72052c
Submitter: Jenkins
Branch: stable/mitaka

commit 4181d3cfcefa313e2b7c9435ca10be847e72052c
Author: asledzinskiy <email address hidden>
Date: Tue Jun 14 11:33:17 2016 +0300

    Fix rabbit nodes count after pacemaker changes

    - When we change pacemaker parameter all nodes are leaving
    cluster so expected number of rabbit nodes is 0

    Change-Id: I3bf19c2de1eaaa9b1cd1a526fd11ad5773c2d529
    Closes-Bug: #1591261

Revision history for this message
Nastya Urlapova (aurlapova) wrote :
Revision history for this message
Andrey Lavrentyev (alavrentyev) wrote :

Looks like it got regression issue or the bug hasn't fixed completely.

Failure on Swarm: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive/57/testReport/%28root%29/change_pacemaker_parameter_not_break_rabbitmq/

Version info:

9.1 snapshot #251

UBUNTU_MIRROR_ID=ubuntu-2016-08-03-174238
CENTOS_MIRROR_ID=centos-7.2.1511-2016-05-31-083834
MOS_UBUNTU_MIRROR_ID=9.0-2016-09-11-182323
MOS_CENTOS_OS_MIRROR_ID=os-2016-06-23-135731
MOS_CENTOS_PROPOSED_MIRROR_ID=proposed-2016-09-11-232321
MOS_CENTOS_UPDATES_MIRROR_ID=updates-2016-06-23-135916
MOS_CENTOS_HOLDBACK_MIRROR_ID=holdback-2016-06-23-140047
MOS_CENTOS_HOTFIX_MIRROR_ID=hotfix-2016-07-18-162958
MOS_CENTOS_SECURITY_MIRROR_ID=security-2016-06-23-140002

Revision history for this message
Nastya Urlapova (aurlapova) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.