010-standalone is randomly failing

Bug #1861486 reported by Cédric Jeanneret
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Cédric Jeanneret

Bug Description

010-standalone is randomly failing due to a failed healthcheck for the nova_migration_target container:

2020-01-30 19:47:40.220619 | primary | TASK [validate-services : Print out any failed Systemd services for tripleo_*] ***
2020-01-30 19:47:40.249635 | primary | Thursday 30 January 2020 19:47:40 +0000 (0:00:00.721) 1:03:01.731 ******
2020-01-30 19:47:40.300366 | primary | ok: [undercloud] => {
2020-01-30 19:47:40.300528 | primary | "systemd_state.stdout_lines": [
2020-01-30 19:47:40.300742 | primary | "tripleo_nova_migration_target_healthcheck.service loaded failed failed nova_migration_target healthcheck"
2020-01-30 19:47:40.300816 | primary | ]
2020-01-30 19:47:40.300858 | primary | }

For instance:
https://7aeffcdc6d98e76087e1-23561effe696d4d724b67bcb38a7b69d.ssl.cf2.rackcdn.com/704773/5/check/tripleo-ci-centos-7-scenario010-standalone/f9e51d3/job-output.txt

This validate-service has been added back in April 2019[1] and apparently has never caused any issue.

The problem is probably related to this patch:
https://review.opendev.org/703819 (and its backport to Train: https://review.opendev.org/704651)

A revert of both changes are in the pipes:
Master: https://review.opendev.org/705177
Train: https://review.opendev.org/705178

I'm also looking into this issue in order to understand why it's failing on that only job (the validate-services thing has been added to a total of 4 jobs, 3 standalone and 1 compute[1])

[1] https://review.opendev.org/637729

Tags: alert ci
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

The issue was hit on another job with another healthcheck:
tripleo-ci-centos-7-containerized-undercloud-upgrades

2020-01-30 16:28:34.334536 | primary | ok: [undercloud] => {
2020-01-30 16:28:34.334841 | primary | "systemd_state.stdout_lines": [
2020-01-30 16:28:34.335092 | primary | "tripleo_ironic_inspector_dnsmasq_healthcheck.service loaded failed failed ironic_inspector_dnsmasq healthcheck"
2020-01-30 16:28:34.335141 | primary | ]
2020-01-30 16:28:34.335177 | primary | }

wes hayutin (weshayutin)
Changed in tripleo:
status: Confirmed → Triaged
Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: Triaged → Incomplete
Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.