ovb-1ctlr_1comp-featureset020-master job timing out

Bug #1758932 reported by Rafael Folco
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Critical
Arx Cruz

Bug Description

ovb fs20 master job is frequently timing out. No errors reported.

I failed to find the root cause or any evidence that could explain these timeouts. However, there is one playbook that is being skipped after a huge interval (2-3 hours):
    2018-03-26 02:52:44.587 | skipping: [undercloud]
    2018-03-26 05:43:20.432 |

This rings a bell...

I checked the latest logs and they all have this same behavior:

2018-03-26 02:52:44.566 | TASK [validate-tempest : Verifying bugs in bugzilla and launchpad and generating skip file] ***
2018-03-26 02:52:44.566 | Monday 26 March 2018 02:52:44 +0000 (0:00:00.055) 0:00:21.644 **********
2018-03-26 02:52:44.587 | skipping: [undercloud]
2018-03-26 05:43:20.432 |
2018-03-26 05:43:20.432 | TASK [validate-tempest : Execute tempest] **************************************

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-master/3641c12/console.txt.gz#_2018-03-26_02_52_44_566

The job continues until being killed by timeout. The total runtime for the job was ~5h:30m:
2018-03-26 01:06:29.844 | Started by user anonymous
2018-03-26 06:33:55.424 | Warning: Permanently added the ECDSA host key for IP address '38.145.32.13' to the list of known hosts.

Note: For the jobs that succeed, there is also a huge interval for the validate-tempest task. However, these jobs were able to complete within the timeout limit:
https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-master/0e974b0
The successful job completed in less than 5h:
2018-03-24 20:11:21.726 | Started by user anonymous
2018-03-25 00:50:10.081 | % Total % Received % Xferd Average Speed Time Time Time Current

Note2: Compared to other jobs (fs001) and the task is being skipped in 7 min only.
2018-03-26 03:14:20.112 | TASK [validate-tempest : Verifying bugs in bugzilla and launchpad and generating skip file] ***
2018-03-26 03:14:20.112 | Monday 26 March 2018 03:14:20 +0000 (0:00:00.060) 0:00:24.711 **********
2018-03-26 03:14:20.132 | skipping: [undercloud]
2018-03-26 03:21:52.818 |
2018-03-26 03:21:52.839 | TASK [validate-tempest : Execute tempest] **************************************
https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/3cf4b7b/console.txt.gz#_2018-03-26_03_14_20_112

Tags: ci quickstart
Revision history for this message
wes hayutin (weshayutin) wrote :

 bump fs20 tempest to 3 workers https://review.openstack.org/556695
 bump ci overcloud flavor for faster job time https://review.openstack.org/556697

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/557354

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/557354
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=a0de810b69b60cc6092ae1cb5258c7ccc2f370e1
Submitter: Zuul
Branch: master

commit a0de810b69b60cc6092ae1cb5258c7ccc2f370e1
Author: Rafael Folco <email address hidden>
Date: Wed Mar 28 11:38:18 2018 -0300

    Temporarily add tempest failures to skip list

    fs020 full tempest has performance regressions and concurrency
    issues that are under investigation.

    Change-Id: I883979a78423297a1051c40846c5b0cd798b628c
    Related-Bug: #1759583
    Related-Bug: #1758932

tags: removed: fs020 master ovb
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/556965

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.