[systest] Sometimes test 'auto_cic_maintenance_mode' fail, because services don't become ready within 25 minutes after reboot

Bug #1543684 reported by Artem Panchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Sergey Novikov
8.0.x
Won't Fix
Medium
Sergey Novikov
Mitaka
Fix Released
Medium
Sergey Novikov

Bug Description

Steps to reproduce:

    Sometimes test 'auto_cic_maintenance_mode' fail, because RabbitMQ cluster and Swift backends don't become ready within 25 minutes after last controller reboot

Actual result:

AssertionError: Failed 3 OSTF tests; should fail 0 tests. Names of failed tests:
  - Check state of haproxy backends on controllers (failure) Some haproxy backend has down state.. Please refer to OpenStack logs for more details.
  - RabbitMQ availability (failure) Number of RabbitMQ nodes is not equal to number of cluster nodes.
  - RabbitMQ replication (failure) Failed to establish AMQP connection to 5673/tcp port on 10.109.23.8 from controller node! Please refer to OpenStack logs for more details.

root@node-2:~# haproxy-status.sh | grep DOWN
swift node-3 Status: DOWN/L7TOUT Sessions: 0 Rate: 0
swift node-2 Status: DOWN/L7TOUT Sessions: 0 Rate: 0

http://paste.openstack.org/show/486465/

But after some time (10-20 minutes) environment becomes to work fine, all health check pass.

root@node-2:~# haproxy-status.sh | grep DOWN
root@node-2:~#
root@node-2:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@messaging-node-2' ...
[{nodes,[{disc,['rabbit@messaging-node-2','rabbit@messaging-node-3',
                'rabbit@messaging-node-5']}]},
 {running_nodes,['rabbit@messaging-node-3','rabbit@messaging-node-5',
                 'rabbit@messaging-node-2']},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]

We need to investigate why recovering takes so much time after enabling/disabling maintenance mode.

Expected result:all OSTF pass after auto_cic_maintenance_mode

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Changed in fuel:
status: New → Confirmed
tags: removed: non-release
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel QA telco (fuel-qa-telco)
description: updated
Changed in fuel:
milestone: 9.0 → 10.0
tags: added: swarm-blocker
Revision history for this message
Ksenia Svechnikova (kdemina) wrote :

For now this issue is blocked by other one: https://bugs.launchpad.net/fuel/+bug/1572574

Revision history for this message
Ksenia Svechnikova (kdemina) wrote :

This issue appear on only 1 test case. I think that it's not "swarm-blocker"

tags: removed: swarm-blocker
Revision history for this message
Dmitry Kalashnik (dkalashnik) wrote :
Changed in fuel:
assignee: Fuel QA telco (fuel-qa-telco) → Sergey Novikov (snovikov)
Revision history for this message
Sergey Novikov (snovikov) wrote :

The fix of https://bugs.launchpad.net/fuel/+bug/1583554 was merged yesterday evening. need wait for result of swarm

tags: added: swarm-fail
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/327779

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Sergey Novikov (snovikov) wrote :

This bug wasn't reproduced during the latest swarm runs
https://product-ci.infra.mirantis.net/view/9.0_swarm/job/9.0.system_test.ubuntu.cic_maintenance_mode/136/
https://product-ci.infra.mirantis.net/view/9.0_swarm/job/9.0.system_test.ubuntu.cic_maintenance_mode/137/

I tried to catch this issue several times and, as result, I reproduced it a just once but, unfortunately, I didn't find the root cause - after 5 minutes cluster works fine. May be it's related to workload and capability of jenkins slave.

I propose to decrease priority and remove tag "swarm-fail"

Changed in fuel:
importance: High → Medium
tags: removed: swarm-fail
tags: added: swarm-fail
tags: added: tricky
Revision history for this message
Sergey Novikov (snovikov) wrote :

After the investigation I've updated ETA:

ETA for fix: 17.06.2016
ETA for porting into 9.0: 21.06.2016

Revision history for this message
Dmitriy Kruglov (dkruglov) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/327779
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=505c56daaf22fcdddb55c2c8fbdb104f1bb2eb69
Submitter: Jenkins
Branch: master

commit 505c56daaf22fcdddb55c2c8fbdb104f1bb2eb69
Author: Sergey Novikov <email address hidden>
Date: Thu Jun 9 18:38:05 2016 +0300

    Stabilize cic maintenance tests

    Change-Id: I0cbb8f73bb18c0b13a95b230789d73d4a32fa15f
    Closes-Bug: #1543684

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/331003

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (stable/mitaka)

Reviewed: https://review.openstack.org/331003
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=b4e596e724540303c6563a6e180fba75b93cfe96
Submitter: Jenkins
Branch: stable/mitaka

commit b4e596e724540303c6563a6e180fba75b93cfe96
Author: Sergey Novikov <email address hidden>
Date: Thu Jun 9 18:38:05 2016 +0300

    Stabilize cic maintenance tests

    Change-Id: I0cbb8f73bb18c0b13a95b230789d73d4a32fa15f
    Closes-Bug: #1543684
    (cherry picked from commit 505c56daaf22fcdddb55c2c8fbdb104f1bb2eb69)

Revision history for this message
Sergey Novikov (snovikov) wrote :

The fix wasn't ported to stable/8.0 branch due to a big difference between stable/8.0 and stable/mitaka. The attempt of backport will impact to many parts of fuel-qa, not only cic maintenance tests

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.