FAIL_PKG_STRINGS not considered in temp_fails

Bug #1978741 reported by Brian Murray
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
Expired
Medium
Unassigned

Bug Description

106 # Some packages can provoke specific breakage. For most packages, this would be
107 # a sign of infrastructure trouble, but for these we should play it safe and
108 # consider these to be regressions. If they *are* infrastructure problems,
109 # we'll have to retry them.
110 FAIL_PKG_STRINGS = {'systemd*': ['timed out waiting for testbed to reboot',
111 'Timed out on waiting for ssh connection',
112 'Temporary failure resolving',
113 'VirtSubproc.Timeout',
114 'ERROR: testbed failure: testbed auxverb failed with exit code 255'],

FAIL_PKG_STRINGS isn't used when constructing temp_fails

799 elif is_failure:
800 contents = log_contents(out_dir)
801 temp_fails = [s for s in (set(TEMPORARY_TEST_FAIL_STRINGS)
802 - set(getglob(OK_PKG_STRINGS, pkgname, [])))
803 if s in contents]
804 if temp_fails:
805 logging.warning('Saw %s in log, which is a sign of a temporary failure.',
806 ' and '.join(temp_fails))
807 logging.warning('%sLog follows:', retrying)
808 logging.error(contents)
809 if retry < 2:
810 submit_metric(architecture, code, pkgname, current_region, True, release)
811 cleanup_and_sleep(out_dir)
812 else:
813 break

I noticed this because we ended up in a situation where systemd was in fact failing to resolve ftpmaster.internal and the tests were continuously retried.

E: Failed to fetch http://ftpmaster.internal/ubuntu/pool/main/e/elfutils/libdw-dev_0.187-1_ppc64el.deb Temporary failure resolving 'ftpmaster.internal'
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
autopkgtest [15:09:55]: ERROR: testbed failure: apt repeatedly failed to download packages
autopkgtest [15:09:56]: test tests-in-lxd: -----------------------]
autopkgtest [15:09:57]: test tests-in-lxd: - - - - - - - - - - results - - - - - - - - - -
tests-in-lxd FAIL non-zero exit status

Tags: adt-56
Revision history for this message
Brian Murray (brian-murray) wrote (last edit ):

And the log from the journal on the cloud-worker:

Jun 14 10:35:04 juju-4d1272-prod-proposed-migration-4 /home/ubuntu/autopkgtest-cloud/worker/worker[1026698]: WARNING: Saw Temporary failure resolving 'ftpmaster.internal' and Failed to fetch http://ftpmaster.internal/ in log, which is a sign of a temporary failure.
Jun 14 10:35:04 juju-4d1272-prod-proposed-migration-4 /home/ubuntu/autopkgtest-cloud/worker/worker[1026698]: WARNING: Retrying in 5 minutes... Log follows:

This should not have been considered a "Temporary failure" given the string in FAIL_PKG_STRINGS.

description: updated
tags: added: adt-56
Changed in auto-package-testing:
importance: Undecided → High
Revision history for this message
Brian Murray (brian-murray) wrote :

This also failed on amd64:

https://autopkgtest.ubuntu.com/results/autopkgtest-kinetic-enr0n-systemd-251/kinetic/amd64/s/systemd/20220613_222811_16846@/log.gz

I haven't been able to recreate it in staging now though. Maybe it has to do with the version of lxd being used now?

autopkgtest [03:03:51]: test tests-in-lxd: [-----------------------
2022-07-08T03:03:55Z INFO Waiting for automatic snapd restart...
lxd 5.3-91e042b from Canonical** installed

vs

autopkgtest [20:50:08]: test tests-in-lxd: [-----------------------
2022-06-13T20:50:12Z INFO Waiting for automatic snapd restart...
lxd 5.2-79c3c3b from Canonical** installed

Revision history for this message
Brian Murray (brian-murray) wrote :

Looking at this again now I'm not sure we have enough information to determine where actually things went wrong. It's not clear to me if we were in "is_failure" and how many times the test was actually retried. Unfortunately, the historical autopkgtest-cloud-worker log files are gone so we are at an impasse here.

Changed in auto-package-testing:
importance: High → Medium
Paride Legovini (paride)
Changed in auto-package-testing:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Auto Package Testing because there has been no activity for 60 days.]

Changed in auto-package-testing:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.