MOS job 5.0.3.staging.ubuntu.bvt_2 failed for 5.0.3

Bug #1414991 reported by Timur Nurlygayanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Aleksandra Fedorova

Bug Description

We can see now that staging job bvt2 failed with the traceback:

Job:
http://jenkins-product.srt.mirantis.net:8080/job/5.0.3.staging.ubuntu.bvt_2/62/console

Traceback:

======================================================================
ERROR: Deploy cluster in HA mode with VLAN Manager
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/usr/lib/python2.7/dist-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/usr/lib/python2.7/dist-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/5.0.3.staging.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 59, in wrapper
    "fail", func.__name__)
  File "/home/jenkins/workspace/5.0.3.staging.ubuntu.bvt_2/fuelweb_test/helpers/decorators.py", line 155, in create_diagnostic_snapshot
    task = env.fuel_web.task_wait(env.fuel_web.client.generate_logs(), 60 * 5)
  File "/home/jenkins/workspace/5.0.3.staging.ubuntu.bvt_2/fuelweb_test/__init__.py", line 48, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/5.0.3.staging.ubuntu.bvt_2/fuelweb_test/models/fuel_web_client.py", line 566, in task_wait
    "was exceeded: ".format(task=task["name"], timeout=timeout))
TimeoutError: Waiting task "dump" timeout 300 sec was exceeded:

----------------------------------------------------------------------
Ran 4 tests in 7170.873s

Tags: devops
Changed in fuel:
assignee: nobody → Fuel DevOps (fuel-devops)
milestone: none → 5.0.3
description: updated
Changed in fuel:
importance: Undecided → High
tags: added: devops
Changed in fuel:
status: New → Confirmed
assignee: Fuel DevOps (fuel-devops) → Aleksandra Fedorova (afedorova)
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Raising to critical since that happened on test.staging.mirror job

Changed in fuel:
importance: High → Critical
Revision history for this message
Aleksandra Fedorova (bookwar) wrote :

1) Failures for http://jenkins-product.srt.mirantis.net:8080/job/5.0.3.staging.ubuntu.bvt_2/62/ and http://jenkins-product.srt.mirantis.net:8080/job/5.0.3.staging.ubuntu.bvt_2/63/ were caused by disk space issues on mc2n4 Jenkins slave.

It hasn't been caught by monitoring due to monitoring misconfiguration.

We cleaned up the space and fixed monitoring. After retrigger job completed successfully:
http://jenkins-product.srt.mirantis.net:8080/job/5.0.3.staging.ubuntu.bvt_2/64/

2) But then we hit the second problem:
The 5.0.3.test_staging_mirror job failed by timeout of its own, even though both bvt tests passed:
http://jenkins-product.srt.mirantis.net:8080/view/5.0.3/job/5.0.3.test_staging_mirror/87/

We have timeout for 5.0.3.test_staging_mirror job set to 300 minutes. But it appears that it is not enough for this job as bvt tests can wait in a queue for some time before they actually start.

Measures to prevent such failures:
 - increased timeout for test_staging_mirror to 500 minutes (we have separate timeouts for every bvt job anyway)
 - always keep some servers reserved to serve bvt tests exclusively:
        currently I've allocated two: mc2n1-srt and mc2n2-srt
 - get more servers
        we wait for 10 more servers in near future

3) Retriggered one more time with fixed timeout, let's wait for results

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Aleksandra Fedorova (bookwar) wrote :

Fixed

Changed in fuel:
status: In Progress → Fix Released
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.