Intemittenly tripleo-ci-centos-8-standalone-upgrade-ussuri timeouts while running tempest tests.

Bug #1903993 reported by Sandeep Yadav
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

Description:
=============

tripleo-ci-centos-8-standalone-upgrade-ussuri sometimes randomly timeouts while running tempest tests, Issue is not continous.

Build history:
=============
https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-standalone-upgrade-ussuri

Timeout observed on different tempest tasks:-

https://d8bc0be68ddb2c0c69e0-fd01ea3c8a41edbe7950db7c1f6ff671.ssl.cf2.rackcdn.com/761745/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/4e3340b/job-output.txt
~~~
2020-11-12 00:17:07.372379 | primary | TASK [os_tempest : Generate test-list file] ************************************
2020-11-12 00:17:07.374135 | primary | Thursday 12 November 2020 00:17:07 +0000 (0:00:01.365) 1:35:53.024 *****
2020-11-12 00:17:14.660588 | primary | ok: [undercloud]
2020-11-12 00:17:15.056406 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]
2020-11-12 00:17:15.057044 | POST-RUN START: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/post.yaml@master]
~~~

https://689acbf7b1bc82ef50ab-1d2e05dc3ab5801f52a161e4cf3cffa5.ssl.cf2.rackcdn.com/762287/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/35461a8/job-output.txt

~~~
2020-11-12 11:19:15.571082 | primary | TASK [os_tempest : Execute tempest tests] **************************************
2020-11-12 11:19:15.572950 | primary | Thursday 12 November 2020 11:19:15 +0000 (0:00:02.370) 1:16:04.447 *****
2020-11-12 11:28:49.695998 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]
2020-11-12 11:28:49.699006 | POST-RUN START: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/post.yaml@master]
2020-11-12 11:28:52.501920 |
~~~

Tags: alert
Revision history for this message
Marios Andreou (marios-b) wrote :

looks like it is _just_ timing out... looking at https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-standalone-upgrade-ussuri the success runs are all at ~ 3 hours which is the timeout

we may need to change something in the job... I don't think tempest is to blame here the tests are not taking that long https://689acbf7b1bc82ef50ab-1d2e05dc3ab5801f52a161e4cf3cffa5.ssl.cf2.rackcdn.com/762287/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/35461a8/logs/stackviz/index.html#/testrepository.subunit/timeline?test=tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_connectivity_between_vms_on_different_networks

We run tempest both after deployment and after upgrade. In the logs from description, the timeout happens in the tempest after upgrade.

However the tempest after deployment is only taking ~ 15 mins to run:

        * https://689acbf7b1bc82ef50ab-1d2e05dc3ab5801f52a161e4cf3cffa5.ssl.cf2.rackcdn.com/762287/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/35461a8/job-output.txt
        * 2020-11-12 09:40:11.889457 | primary | PLAY [Validate the deployment] *************************************************
        * 2020-11-12 09:43:41.044614 | primary | TASK [os_tempest : Execute tempest tests] **************************************
2020-11-12 09:43:41.044716 | primary | Thursday 12 November 2020 09:43:41 +0000 (0:00:00.092) 0:55:18.928 *****
2020-11-12 10:02:53.300626 | primary | ok: [undercloud]

Revision history for this message
Marios Andreou (marios-b) wrote :

just discussed in a call with rlandy

standalone upgrade does deploy & tempest & upgrade & tempest

we are going to suggest removing the first tempest execution after deployment.

I will post something in a few mins to do that.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/762674

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/762674
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=f9ff7116d675d8c9fa042c3d64b123784ab47961
Submitter: Zuul
Branch: master

commit f9ff7116d675d8c9fa042c3d64b123784ab47961
Author: MSA <email address hidden>
Date: Fri Nov 13 17:34:48 2020 +0200

    Skip running tempest after deployment for standalone upgrade jobs

    Per the related bug, the standalone upgrade is running too close
    to 3 hours and times out. Let's try to reduce that by removing
    the tempest after deployment with a conditional. It will still
    run tempest after the upgrade.

    Change-Id: I58b817336c77808b676783e75eee14255896ee84
    Related-Bug: 1903993

Revision history for this message
Marios Andreou (marios-b) wrote :

the change to skip tempest after deployment at https://review.opendev.org/762674 was merged last yesterday afternoon

The topmost results from https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-8-standalone-upgrade-ussuri from the time after 762674 merged (attached here) show a lot more "2" than "3".

I am going to move this fix-released for now. move back if you disagree and this is still happenin thanks

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.