periodic centos8 standalone-full-tempest-tempest-master timeout

Bug #1867945 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

This blocks the tempest component promotions - at [1][2][3] the periodic-tripleo-ci-centos-8-standalone-full-tempest-tempest-master job times out during the tempest test run. In [2][3] this is followed by a timeout in collect logs so we don't have logs to confirm but in [1] the tempest logs [4] show an error like:

        Captured traceback:
        ~~~~~~~~~~~~~~~~~~~
            Traceback (most recent call last):
              File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
                return f(*func_args, **func_kwargs)
              File "/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_device_tagging.py", line 255, in test_tagged_boot_devices
                'boot_index': 2
              File "/usr/lib/python3.6/site-packages/tempest/api/compute/base.py", line 263, in create_test_server
                **kwargs)
              File "/usr/lib/python3.6/site-packages/tempest/common/compute.py", line 271, in create_test_server
                server['id'])
              File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
                self.force_reraise()
              File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
                six.reraise(self.type_, self.value, self.tb)
              File "/usr/local/lib/python3.6/site-packages/six.py", line 703, in reraise
                raise value
              File "/usr/lib/python3.6/site-packages/tempest/common/compute.py", line 242, in create_test_server
                clients.servers_client, server['id'], wait_until)
              File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status
                raise lib_exc.TimeoutException(message)
            tempest.lib.exceptions.TimeoutException: Request timed out
            Details: (TaggedBootDevicesTest:test_tagged_boot_devices) Server 9beff40a-a0c6-474f-9854-9d38dd7a9f4a failed to reach ACTIVE status and task state "None" within the required time (300 s). Current status: BUILD. Current task state: spawning.

Promotion criteria for tempest component can be found there [5]

[1] https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-tempest-master/b45b0e3/job-output.txt
[2] https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-tempest-master/3d2436b/job-output.txt
[3] https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-tempest-master/71fb8b6/job-output.txt
[4] https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-tempest-master/b45b0e3/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz
[5] https://github.com/rdo-infra/ci-config/blob/b3df246ddca2fd437dfa38ad03931618d78f24c9/ci-scripts/dlrnapi_promoter/config/CentOS-8/component/master.yaml#L65

Revision history for this message
Marios Andreou (marios-b) wrote :

some more tempest related fails today in compute and tempest components... fails during tempest run and no logs at

*=* 12:36:22 *=*=*= " * openstack-component-tempest "

        * https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-tempest-master/de4fa04/job-output.txt

*=* 12:29:48 *=*=*= " openstack-component-compute "

        * https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-compute-master/d55ae94/job-output.txt

not clear if that is related or not cos no logs :/

Revision history for this message
Marios Andreou (marios-b) wrote :

and again same fails today @

*=* 10:14:20 *=*=*= " * openstack-component-tempest "

        * https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-tempest-master/c8fe81d/job-output.txt
        * 2020-03-19 18:51:46.673195 | primary | TASK [os_tempest : Execute tempest tests] **************************************
        2020-03-19 18:51:46.698385 | primary | Thursday 19 March 2020 18:51:46 +0000 (0:00:00.086) 0:40:18.262 ********
        2020-03-19 22:03:31.184555 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]

*=* 10:12:51 *=*=*= " * openstack-component-compute "

        * https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-compute-master/07f96f9/job-output.txt
        * 2020-03-19 08:17:53.641532 | primary | TASK [os_tempest : Execute tempest tests] **************************************
        2020-03-19 08:17:53.660012 | primary | Thursday 19 March 2020 08:17:53 +0000 (0:00:00.077) 0:37:23.529 ********
        2020-03-19 10:26:55.333534 | primary | fatal: [undercloud]: FAILED! => {
        * 2020-03-19 10:27:09.543498 | primary | Data could not be sent to remote host "127.0.0.2". Make sure this host can be reached over ssh: ssh: connect to host 127.0.0.2 port 22: Connection timed out

& no logs

just posted a testproject @ https://review.rdoproject.org/r/26020 and trying

both jobs are not at all stable
        * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-standalone-full-tempest-compute-master
        * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-standalone-full-tempest-tempest-master

Revision history for this message
Marios Andreou (marios-b) wrote :

update again... on the held node (from comment #2) the tempest failed in yet another way with failing * tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps

        * AssertionError: Timed out waiting for 10.100.0.8 to become reachable from 192.168.24.103

discussed this afternoon on the centos8 sync... agreed the best way forward to stabilize these jobs is split api/scenario tests as we did for fs20.

wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.