007-cmdmon flaky autopkgtest failure on armhf

Bug #2002910 reported by Bryce Harrington
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
chrony (Ubuntu)
New
Undecided
Unassigned

Bug Description

On armhf (only), chrony's autopkgtest intermittently fails, due to the 007-cmdmon test case of the 099-scfilter test. Typically after 1-3 retries it'll work properly and the test will pass. This same failure occurs with 4.3-1ubuntu1 and 4.2-2ubuntu2 in lunar, and on a few jammy versions. There have been recurring flaky test issues on focal and other releases as well but the test output differs, indicating test timeouts.

While retrying has been an effective brute force way of getting the package migrated, this bug report aims to understand why it is failing and find a better solution to work around or fix it.

From the logs, the failing test output is as follows:

099-scfilter Testing system call filter in non-destructive tests:
  level -1:
    001-minimal OK
    002-extended OK
    003-memlock OK
    004-priority OK
    006-privdrop OK
    007-cmdmon BAD
FAIL

The 099-scfilter is a wrapper program that runs the 007-cmdmon script with different values for $TEST_SCFILTER. If there is any sort of failure in 007-cmdmon it is marked BAD and the 099-scfilter test set to FAIL. (Note there are 10 total subtest cases, but it bails out immediately when 007 fails.)

What 007-cmdmon does is invoke chronyc for a number of different commands such as 'refresh', 'add server', 'allow', 'reset sources', 'serverstats', 'settime now', etc., verify the command's exit code, and check that the command's output is as expected, such as "200 OK", etc.

Debian does not show this failure on armhf, but does hit it (or something quite similar) on s390x and ppc64el:
  * https://ci.debian.net/data/autopkgtest/unstable/s390x/c/chrony/28870427/log.gz
  * https://ci.debian.net/data/autopkgtest/unstable/ppc64el/c/chrony/28870171/log.gz

Debian does not have a bug report about this test failure. There is one existing bug report about a UNIX socket issue when running cmd allow / deny that might be relevant:
  * https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995201

Redhat also hits this failure in their CI:
  * https://bugzilla.redhat.com/show_bug.cgi?id=1990589
    "It seems this due to the glibc update. New system calls need to be allowed in the chrony seccomp filter. https://git.tuxfamily.org/chrony/chrony.git/commit/?id=bbbd80bf03223f181d4abf5c8e5fe6136ab6129a"
  * However, the patch they flagged as fixing it, is already present in chrony 4.3.
  * But it suggests there may be a similar issue due to a new syscall in glibc 2.36?
    https://lists.gnu.org/archive/html/info-gnu/2022-08/msg00000.html

Note that without more log detail as to what is failing inside 007-cmdmon, it's certainly possible that two 007-cmdmon failures could be failing on completely different commands, and thus may be entirely unrelated issues. It may be necessary to run the test case locally on armhf to reproduce, and if necessary add debugging to isolate where exactly the failure occurs.

Bryce Harrington (bryce)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.