Zuul console spam: [primary] Waiting on logger

Bug #1806655 reported by Sorin Sbarnea
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Gate
Invalid
Undecided
Unassigned
tripleo
New
Undecided
Unassigned

Bug Description

It seems that zuul is spamming logs with repeated messages of "[primary] Waiting on logger" which seem to be made at small intervals.

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22%5Bprimary%5D%20Waiting%20on%20logger%5C%22

The purpose of this bug is to track this over time so we will add an elastic-recheck filter for it.

Revision history for this message
Clark Boylan (cboylan) wrote :

This happens if the zuul logger daemon on the test nodes is stopped. One reason this may happen is if the job reboots the test node. Looking at logstash this appears to be a tripleo specific behavior. Can we add tripleo to the bug if it is specific to those jobs? I doubt that zuul logger is crashing on its own, instead the job is killing it somehow.

Revision history for this message
Matt Riedemann (mriedem) wrote :

There is no reason to put this into elastic-recheck as this query results in 100% success rate in build_results, i.e. where this shows up the jobs aren't failed.

Revision history for this message
Clark Boylan (cboylan) wrote :

http://logs.openstack.org/25/620625/2/gate/tripleo-ci-centos-7-standalone/70949b6/logs/undercloud/var/log/journal.txt.gz#_Dec_06_16_08_49

I tracked this down to that log file. The host is being rebooted. When this happens you have to restart the log streamer for Zuul. You can do this using the start-zuul-console role, http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/start-zuul-console.

As for why the reboot happens I'm not entirely sure yet. It seems to happen after running tempest? I don't think tempest should cause a reboot, so you may want to double check that this behavior is expected and if it is expected maybe add a wait in ansible so that the nested ara report isn't truncated.

Changed in openstack-gate:
status: New → Invalid
Revision history for this message
Clark Boylan (cboylan) wrote :

logan- has pointed out that nested virt can cause these crashes, which may explain why tempest can cause a reboot. If that is the case you probably want to switch back to qemu and not kvm. Also probably detect the reboot case more robustly as unexpected reboots are likely failures?

Revision history for this message
Alex Schultz (alex-schultz) wrote :

In terms of tripleo, this is tracked via Bug 1806720. I'll mark this as a dupe of that one.

Sorin Sbarnea (ssbarnea)
summary: - Zuul console spam: [primary] Waiting for logger
+ Zuul console spam: [primary] Waiting on logger
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.