promote featureset020 failing random temptest tests

Bug #1755834 reported by John Trowbridge
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
John Trowbridge

Bug Description

We currently see this on stable/queens promote jobs[1], however it may show up on master as well once we have those jobs running again.

It is not clear yet what the root cause is, but the last 3 runs have all failed during tempest on different tests[2,3,4]. The types of failures look like the type one would see when there are just not enough resources to run tempest at that concurrency. We only have 2 workers though, so the situation can not be improved much by lowering the concurrency.

Another possible solution would be to use a bigger flavor specifically for this job, however it would be good to understand more about the actual root issue before doing this.

Reproducer script: https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/1107bd8/reproducer-quickstart.sh

[1] 78-80 https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/
[2] https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/8857d09/undercloud/home/jenkins/tempest/tempest.html.gz
[3] https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/3f49d08/undercloud/home/jenkins/tempest/tempest.html.gz
[4] https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/1107bd8/tempest.html.gz

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart (master)

Fix proposed to branch: master
Review: https://review.openstack.org/553019

Changed in tripleo:
assignee: nobody → John Trowbridge (trown)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart (master)

Reviewed: https://review.openstack.org/553019
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=b3872ce26bcd61020fbf2d7a7b26c939094dcf5b
Submitter: Zuul
Branch: master

commit b3872ce26bcd61020fbf2d7a7b26c939094dcf5b
Author: John Trowbridge <email address hidden>
Date: Wed Mar 14 15:26:14 2018 -0400

    Explicitly set tempest workers for full tempest featureset

    We run full tempest in featureset020, which is a bit resource
    intensive for CI. If we use the dynamic approach of setting
    concurrency to half the CPUs, we end up with random failures.

    This patch instead changes the concurrency to 2 only for this
    featureset.

    Change-Id: I6653d6b4851deb5fb64ffc3734a5a4556b51c57a
    Closes-Bug: #1755834

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
John Trowbridge (trown) wrote :

Moving this back to triaged. I am not sure if the concurrency reduction helped.

We are now failing on a large number of tests, and I am not sure if they are just legitimate failures now, or if reducing the concurrency actually made things worse:

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-queens/c8f1d3c/tempest.html.gz

Changed in tripleo:
status: Fix Released → Triaged
Revision history for this message
John Trowbridge (trown) wrote :
Revision history for this message
John Trowbridge (trown) wrote :

This did fix the random 2-3 failures we were seeing, but then we got a new bug related to openvswitch not being started on the compute nodes:

https://bugs.launchpad.net/tripleo/+bug/1757111

Since we have a different bug to track that, I will close this one.

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart 2.1.1

This issue was fixed in the openstack/tripleo-quickstart 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.