standalone tempest seems to just hang

Bug #1806720 reported by Alex Schultz
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Alex Schultz

Bug Description

We've had some failures in the standalone where tempest is not containerized and the job just hangs.

http://logs.openstack.org/99/618899/3/gate/tripleo-ci-centos-7-standalone/d0e6fe3/job-output.txt.gz#_2018-12-04_07_26_10_455770

2018-12-04 07:26:10.455770 | primary | TASK [validate-tempest : Execute tempest] **************************************
2018-12-04 07:26:10.633703 | primary | Tuesday 04 December 2018 07:26:10 +0000 (0:00:00.333) 0:53:11.516 ******
2018-12-04 09:04:58.438722 | [Zuul] Log Stream did not terminate
2018-12-04 09:04:58.513263 | primary | ERROR
2018-12-04 09:04:58.540492 | primary | {
2018-12-04 09:04:58.540730 | primary | "delta": "0:33:39.447322",
2018-12-04 09:04:58.540848 | primary | "end": "2018-12-04 09:04:27.226744",
2018-12-04 09:04:58.540959 | primary | "msg": "non-zero return code",
2018-12-04 09:04:58.541065 | primary | "rc": 2,
2018-12-04 09:04:58.541169 | primary | "start": "2018-12-04 08:30:47.779422"
2018-12-04 09:04:58.541294 | primary | }

The tempest logs just stop and about an hour later the job fails. Log collection still works so the system is still up.

Tags: ci
Arx Cruz (arxcruz)
Changed in tripleo:
assignee: nobody → Arx Cruz (arxcruz)
Revision history for this message
Alex Schultz (alex-schultz) wrote :

clarkb in #openstack-infra says the VM is crashing when tempest runs and it's most likely related to nested vert. A work around will be to stop dynamically using nested virt and force qemu for the standalone jobs.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.openstack.org/623293

Changed in tripleo:
assignee: Arx Cruz (arxcruz) → Alex Schultz (alex-schultz)
status: Triaged → In Progress
Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

Isn't qemu this going to introduce new performance inssues on jobs that were already slow?

Revision history for this message
Alex Schultz (alex-schultz) wrote :

This is only for standalone which isn't that slow. It's only going to cause an increase in tempest runs (and only ones that exercise the vm). This is likely an issue introduced by 7.6

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/623293
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=a466cc6a2018f64e118024f9f8bd1bb1997a0310
Submitter: Zuul
Branch: master

commit a466cc6a2018f64e118024f9f8bd1bb1997a0310
Author: Alex Schultz <email address hidden>
Date: Thu Dec 6 12:24:41 2018 -0700

    Disable nested virt for standalone

    Nested virt is causing the VMs to crash in CI occassicially when tempest
    is running. While it's not ideal to use qemu, it's better not to have
    crashing VMs.

    Change-Id: Ia9944f6346709dbea9f480677d60efc7a5d4e162
    Closes-Bug: #1806720

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Matt Riedemann (mriedem) wrote :

Still seeing this:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22%5Bprimary%5D%20Waiting%20on%20logger%5C%22%20%20%20%20%20AND%20tags%3A%5C%22console%5C%22&from=7d

Is ^ for something else (it's a very generic query) or does this fix need to be backported to stable branches?

Revision history for this message
Rafael Folco (rafaelfolco) wrote :

The error signature 'waiting on logger' for the recent failures (mostly 4/24) is just red herring. This not about nested virt, but apparently a networking issue, possibly transient.

Was able to spot errors like:
- "ConnectTimeoutError"
- "Network is unreachable"
- "Could not resolve host: mirror.mtl01.inap.openstack.org; Unknown error"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart-extras 2.1.1

This issue was fixed in the openstack/tripleo-quickstart-extras 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.