object-storage backend L7 check in down state after deployment

Bug #1570805 reported by Tatyanka on 2016-04-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Alexandr Kostrikov
Mitaka
High
Alexandr Kostrikov

Bug Description

WARNING!
Not all "Some haproxy backend failed" errors are mapped to this bug.

Please, check ostf log from snapshot if it contain message 'L7 check'.
There are bugs with description like that mapped to that bug.
`2016-05-29 02:34:26 ERROR (nose_storage_plugin) fuel_health.tests.ha.test_haproxy.HAProxyCheck.test_001_check_state_of_backends
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 67, in testPartExecutor
    yield
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 601, in run
    testMethod()
  File "/usr/lib/python2.7/site-packages/fuel_health/tests/ha/test_haproxy.py", line 92, in test_001_check_state_of_backends
    "Step 2 failed: Some haproxy backend has down state.")
  File "/usr/lib/python2.7/site-packages/fuel_health/common/test_mixins.py", line 164, in verify_response_true
    self.fail(message.format(failed_step_msg, msg))
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 666, in fail
    raise self.failureException(msg)
AssertionError: Step 2 failed: Some haproxy backend has down state.. Please refer to OpenStack logs for more details.
`

Steps to reproduce:
1. Create cluster
2. Add 3 nodes with controller and ceph OSD roles
3. Add 2 nodes with compute
4. Deploy the cluster
5. Run OSTF

Actual Result:
After successful deployment haproxy test fails with :
 Dead backends ['object-storage node-1 Status: DOWN/L7TOUT Sessions: 0 Rate: 0 ']

Expected Result:
OSTF tests are passed

Reproducibility:
2 times from 2 attempts

Version of components:
http://paste.openstack.org/show/494199/

Peter Zhurba (pzhurba) on 2016-04-15
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Peter Zhurba (pzhurba)
Changed in fuel:
status: New → Confirmed
tags: added: area-library
Peter Zhurba (pzhurba) wrote :

It seems problem in job CI itself.

After last occurrence provided by tatyana-leontovich. On env where it happened. I run fuel --env 1 health --check ha near 10 times and each pass was performed without any error.

Also I cold not reproduce issue on manually deployed environment.

So probably we need more time to wait cluster up before performing this test.

(venv-nailgun-tests-2.9)jenkins@ci-slave31:~$ dos.py revert-resume env_master_gate_ostf_update error_gate_ostf_update ; ssh -tt root@10.109.10.2 fuel --env 1 health --check ha
Time synchronization is starting
.......
.......

New time on 'slave-06' = Wed Apr 20 16:23:32 UTC 2016
[ 1 of 7] [success] 'Check state of haproxy backends on controllers' (1.036 s)
[ 2 of 7] [success] 'Check data replication over mysql' (3.14 s)
[ 3 of 7] [success] 'Check if amount of tables in databases is the same on each node' (2.739 s)
[ 4 of 7] [success] 'Check galera environment state' (0.9438 s)
[ 5 of 7] [success] 'Check pacemaker status' (1.182 s)
[ 6 of 7] [success] 'RabbitMQ availability' (8.257 s)
[ 7 of 7] [success] 'RabbitMQ replication' (18.78 s)
Connection to 10.109.10.2 closed.
(venv-nailgun-tests-2.9)jenkins@ci-slave31:~$

Peter Zhurba (pzhurba) on 2016-04-21
Changed in fuel:
assignee: Peter Zhurba (pzhurba) → Tatyanka (tatyana-leontovich)
Changed in fuel:
status: Confirmed → In Progress
tags: added: area-qa
removed: area-library

Reviewed: https://review.openstack.org/311144
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=03c3fd8cf9fde9424d83a8d80932767ccbde3bad
Submitter: Jenkins
Branch: master

commit 03c3fd8cf9fde9424d83a8d80932767ccbde3bad
Author: Tatyana Leontovich <email address hidden>
Date: Fri Apr 29 18:11:59 2016 +0300

    Add check_ceph health into gate_ostf

    In gate test we do not check ceph health after revert,
    that leads to false negative result like in lp1570805. Add it usage here.
    Also enable platfrom tests to run

    Change-Id: I2de88b66978149f535e8c13fd9c402c9ee407a8a
    Closes-Bug: #1570805

Changed in fuel:
status: In Progress → Fix Committed
tags: added: swarm-blocker
Tatyanka (tatyana-leontovich) wrote :

Alexander I think you should create a separate issue, because this one related only for gate tests and errors that looks like the same, but the reason looks different.

Thanks, Tatyana!

The problem has the same symptoms.
If it is a really new bug and not just reproduce in another place - I will post another bug

Changed in fuel:
assignee: Tatyanka (tatyana-leontovich) → MOS Linux (mos-linux)
status: Fix Committed → Confirmed
Changed in fuel:
assignee: MOS Linux (mos-linux) → MOS Ceph (mos-ceph)

There is a need to get further investigation with logs from ceph/ceph configuration to assign it on ceph team

Changed in fuel:
assignee: MOS Ceph (mos-ceph) → Fuel QA Team (fuel-qa)
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Alexandr Kostrikov (akostrikov-mirantis)

I have not found reproduces, will wait for swarm statistics to review, if it has been reproduced.

Related fix has been merged https://bugs.launchpad.net/fuel/newton/+bug/1582646 - possibly that could fix the issue

description: updated
Download full text (8.5 KiB)

During last swarm there were false positives of detection of this bug https://mirantis.testrail.com/index.php?/plans/view/12130
That is due to misleading error message, which should be made more verbose by https://review.openstack.org/#/c/322867/

Seems, related fix for object storage has resolved this issue https://bugs.launchpad.net/fuel/newton/+bug/1582646

Actual errors are:
https://mirantis.testrail.com/index.php?/tests/view/6475064
https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ha_neutron_tun_scale/124/testReport/(root)/neutron_tun_scalability/neutron_tun_scalability/
2016-05-30 03:41:47 DEBUG (test_haproxy) Dead backends ['horizon-ssl node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'keystone-1 node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'keystone-2 node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'nova-api node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'nova-metadata-api node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'cinder-api node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'glance-api node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'glance-glare node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'neutron node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'glance-registry node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'heat-api node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'heat-api-cfn node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'heat-api-cloudwatch node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ', 'nova-novncproxy node-1 Status: DOWN/L4CON Sessions: 0 Rate: 0 ']

https://mirantis.testrail.com/index.php?/tests/view/6475080
https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.controller_replacement/123/testReport/(root)/deploy_ha_neutron_tun_ctrl_replacement/deploy_ha_neutron_tun_ctrl_replacement/
2016-05-30 13:02:19 DEBUG (test_haproxy) Dead backends ['mysqld node-5 Status: DOWN/L4CON Sessions: 0 Rate: 0 ']
2016-05-30 13:02:19 ERROR (nose_storage_plugin) fuel_health.tests.ha.test_haproxy.HAProxyCheck.test_001_check_state_of_backends
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 67, in testPartExecutor
    yield
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 601, in run
    testMethod()
  File "/usr/lib/python2.7/site-packages/fuel_health/tests/ha/test_haproxy.py", line 92, in test_001_check_state_of_backends
    "Step 2 failed: Some haproxy backend has down state.")
  File "/usr/lib/python2.7/site-packages/fuel_health/common/test_mixins.py", line 164, in verify_response_true
    self.fail(message.format(failed_step_msg, msg))
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 666, in fail
    raise self.failureException(msg)
AssertionError: Step 2 failed: Some haproxy backend has down state.. Please refe...

Read more...

Changed in fuel:
status: Confirmed → Fix Released

Reviewed: https://review.openstack.org/322867
Committed: https://git.openstack.org/cgit/openstack/fuel-ostf/commit/?id=afc374aa9efc6e9f363d6e4211607bfd9f121b43
Submitter: Jenkins
Branch: master

commit afc374aa9efc6e9f363d6e4211607bfd9f121b43
Author: Alexandr Kostrikov <email address hidden>
Date: Mon May 30 17:57:40 2016 +0300

    Detailed haproxy failure message

    That is aimed to make debug easier and
    faster.

    Change-Id: I5aa633b954e4cd8c6bb7291339c1c5806925c8c9
    Related-bug: #1570805
    Closes-bug: #1587063

Reviewed: https://review.openstack.org/315441
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=7b7311cf6f3d33948f79c362612ba2a5ff211a57
Submitter: Jenkins
Branch: stable/mitaka

commit 7b7311cf6f3d33948f79c362612ba2a5ff211a57
Author: Tatyana Leontovich <email address hidden>
Date: Fri Apr 29 18:11:59 2016 +0300

    Add check_ceph health into gate_ostf

    In gate test we do not check ceph health after revert,
    that leads to false negative result like in lp1570805. Add it usage here.
    Also enable platfrom tests to run

    Change-Id: I2de88b66978149f535e8c13fd9c402c9ee407a8a
    Closes-Bug: #1570805
    (cherry picked from commit 03c3fd8cf9fde9424d83a8d80932767ccbde3bad)

tags: added: in-stable-mitaka
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers