autopkgtest boot-and-services fails on many architectures very often since systemd/239-7ubuntu12

Bug #1805358 reported by Christian Ehrhardt  on 2018-11-27
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned
Cosmic
Undecided
Unassigned

Bug Description

[Impact]

 * Boot and services autopkgtest case has become flakey, especially on ppc64el but other architectures too.

[Test Case]

 * Check that boot-and-services passes more reliably

[Regression Potential]

 * this is a test change only

[Other Info]

 * gdm3 used to start reliably, in nested qemu, with dummy xorg, but it doesn't anymore. Thus stop testing gdm3 coming up as part of the systemd autopkgtest. Mostly, because in realisty it's not that interesting to see if gdm3 works with dummy xorg, as we mostly care about gdm3 working on bare-metal with proper graphics cards intel/amd/nvidia/etc.

I appreciate that some old known issues on 'upstream' are fixed.
But looking at the current tests I see that often on amd64 (2/3 and very often on ppc64le (4/4 since ubuntu12 and 11/11 before that version).

Looking at the list of people hitting retry and the list of packages blocked that seems an issue that must be checked.

See current head of tests at:
amd64: http://autopkgtest.ubuntu.com/packages/s/systemd/disco/amd64
ppc64el: http://autopkgtest.ubuntu.com/packages/s/systemd/disco/ppc64el

One example log:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-disco/disco/ppc64el/s/systemd/20181126_074048_cb8d2@/log.gz

Example of the error that one can see in the log:
test_no_failed (__main__.ServicesTest)
No failed units ... -------- journal for failed service user@118.service -----------
-- Logs begin at Tue 2018-10-30 09:54:39 UTC, end at Mon 2018-11-26 07:05:17 UTC. --
Nov 26 07:05:09 autopkgtest systemd[1]: Starting User Manager for UID 118...
Nov 26 07:05:10 autopkgtest systemd[1292]: pam_unix(systemd-user:session): session opened for user gdm by (uid=0)
Nov 26 07:05:10 autopkgtest systemd[1292]: Failed to fully start up daemon: Permission denied
Nov 26 07:05:10 autopkgtest systemd[1]: user@118.service: Failed with result 'protocol'.
Nov 26 07:05:10 autopkgtest systemd[1]: Failed to start User Manager for UID 118.
FAIL

======================================================================
FAIL: test_no_failed (__main__.ServicesTest)
No failed units
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/autopkgtest.Xm5RBa/build.JSa/src/debian/tests/boot-and-services", line 62, in test_no_failed
    self.assertEqual(failed, [])
AssertionError: Lists differ: ['user@118.service loaded failed failed User Manager for UID 118'] != []

First list contains 1 additional elements.
First extra element 0:
'user@118.service loaded failed failed User Manager for UID 118'

- ['user@118.service loaded failed failed User Manager for UID 118']
+ []

Related branches

Reproducing the bug:
- of course it had to work reliably on a local amd64
- ppc based autopkgtest has the usual non-fun of bug 1630909 still failing me

But when checking the logs that we have we see:
- service user@118.service is the one failing
- might that be related to GDB issues we had in the past?

Log:
pam_unix(systemd-user:session): session opened for user gdm by (uid=0)
Failed to fully start up daemon: Permission denied
user@118.service: Failed with result 'protocol'.

It seems to me that this:
+ * boot-and-services: skip gdm test, when gdm-x-session fails.
+ Across all architectures, gdm fails to come up reliably since cosmic.
+ (LP: #1790478)

Needs to be extended for ppc64

I wonder what you'd think about either:

#1
--- a/debian/tests/boot-and-services
+++ b/debian/tests/boot-and-services
@@ -62,6 +62,7 @@ class ServicesTest(unittest.TestCase):
             self.assertEqual(failed, [])

     @unittest.skipUnless(subprocess.call(['which', 'gdm3'], stdout=subprocess.DEVNULL) == 0, 'gdm3 not found')
+ @unittest.skipUnless(subprocess.call(['ps', 'u', '-C', 'gdm-x-session'], stdout=subprocess.DEVNULL) == 0, 'gdm-x-session failed to start')
     def test_gdm3(self):
         out = subprocess.check_output(['ps', 'u', '-C', 'gdm-x-session'])
         self.assertIn(b'gdm-x-session gnome-session', out)

or

#2
--- a/debian/tests/boot-and-services
+++ b/debian/tests/boot-and-services
@@ -51,6 +51,8 @@ class ServicesTest(unittest.TestCase):
         failed = [f for f in failed if 'thermald' not in f]
         # console-setup.service fails on devices without keyboard (LP: #1516591)
         failed = [f for f in failed if 'console-setup' not in f]
+ # gdm lets user 118 fail (LP: #1805358)
+ failed = [f for f in failed if 'user@118.service' not in f]
         # cpi.service fails on s390x
         failed = [f for f in failed if 'cpi.service' not in f]
         if failed:

@xnox - would you have any preference between #1, #2 or neither of the above?

FYI - there is a chance that it works - by chance one of my retries now worked.
But given the low percentage I still think we should fix it to some way ignore that error.

Since it most reliably fails on LP infra and almost never locally I tried to verify it in [1][2]
But this becomes a can of worms hitting [3] now.

Whenever you do the next merge/fix you'll need to fix that anyway; Also on that next upload that will follow sooner or later you will have to pass the tests - so I'll avoid duplication of work.

Instead I attach the current patch that I have here for your consideration, since it only ignores one service fail it should be rather safe - worst case it doesn't fix the issue, but it should hopefully not cause extra breakage.

Leaving it to you, let me know if I can help anything

[1]: https://bileto.ubuntu.com/#/ticket/3541
[2]: https://bileto.ubuntu.com/#/ticket/3542
[3]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=909455

tags: added: patch
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 239-7ubuntu14

---------------
systemd (239-7ubuntu14) disco; urgency=medium

  * Fix compat with new meson.
    File: debian/patches/meson-rename-Ddebug-to-Ddebug-extra.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=3b764ec1b76768a8c40635019fa5a8acb81b223e

 -- Dimitri John Ledkov <email address hidden> Thu, 29 Nov 2018 16:53:00 +0000

Changed in systemd (Ubuntu):
status: New → Fix Released

Thanks!

Changed in systemd (Ubuntu Cosmic):
status: New → In Progress
description: updated

Hello Christian, or anyone else affected,

Accepted systemd into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/239-7ubuntu10.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Cosmic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Dimitri John Ledkov (xnox) wrote :

boot-and-services passed on all arches in cosmic autopkgtests.

tags: added: verification-done verification-done-cosmic
removed: verification-needed verification-needed-cosmic
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.