v234 seems to fail to reboot 5 times in a row on s390x, and crashes amd64/i386 instances

Bug #1708051 reported by Dimitri John Ledkov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

ppc64el is fine.

Given that everything is fixed to start containers and vms non-degraded, let's enforce that we boot not degraded.

Also network online timeout is 30s, thus it makes no sense to only give the boot 10s. Especially since machines can be overcommitted with capacity.

update: tests were improved somewhat, to be more deterministic and always wait for the boot to fully finish before rebooting. On the infrastructure - all but i386/amd64 pass. But locally it is not reproducible. It almost feels like openstack-nova-autopkgtest-reboot-marker integration is broken; or systemd fails to reboot in scalingstack.

Tags: adt-fail
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

$ cat logs-amd64/summary
timedated PASS
hostnamed PASS
localed-locale PASS
localed-x11-keymap PASS
logind PASS
unit-config PASS
storage PASS
networkd-test.py PASS
build-login PASS
boot-and-services PASS
udev PASS
root-unittests PASS
upstream PASS
boot-smoke PASS
systemd-fsckd PASS

$ cat logs-i386/summary
timedated PASS
hostnamed PASS
localed-locale PASS
localed-x11-keymap PASS
logind PASS
unit-config PASS
storage PASS
networkd-test.py PASS
build-login PASS
boot-and-services PASS
udev PASS
root-unittests PASS
upstream PASS
boot-smoke PASS
systemd-fsckd PASS

tags: added: adt-fail block-proposed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
description: updated
Revision history for this message
Iain Lane (laney) wrote :

xnox, disabling these is worrying - the systemd tests quite reliably pass with the old systemd (233-8ubuntu3) and fail with the new (234-2ubuntu2) - this looks like an actual problem to me, and not something to blame on the infrastructure and hide by skipping the tests?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

systemd reboots fine in bare metal, qemu, canonistack instances (both ssh reboot & nova reboot).
the boot-smoke/bsystemd-fsckd tests pass on ppc64el.
these tests also pass on amd64/i386 qemu runner when executed with env FORCE_REBOOT_TEST=1 variable.

Dropping block-proposed tag for 234-2ubuntu5 upload, and will be re-added again to force manual test run of the above mentioned two tests on any future uploads.

tags: removed: block-proposed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@laney

After many local runs, I never see a failure of boot-smoke. I do see flakey failures of nspawn tests which is now logged in upstream tracker and will be investigated further - https://github.com/systemd/systemd/issues/6614

Reboot test failures, do mention unexpected EOF which on the surface looks related to ssh fixes in autopkgtest 4.4 and the patches to resolve a similar EOF with qemu runner reported in Debian BTS.
Next steps for me are:
* continue testing tests locally with FORCE_REBOOT_TEST=1 environment variable to continue to validate each upload (no regressions)
* pull latest autopkgtest with pending patches from git and run that with ssh runner against canonistack
* hopefully reproduce & fix the hang with canonistack assistance

I do not have any confirmation that systemd in proposed is bad.

Revision history for this message
Iain Lane (laney) wrote :

xnox and I debugged this together, and it seems like the upload of 234 in Debian switched to Meson, and there was a bug introduced here where KillUserProcesses was set back to `yes' instead of `no' as it's meant to be. To reboot the machine, autopkgtest runs `sh -c (sleep 3; reboot) &' over SSH. When we background this and then end the SSH connection - and logind session - systemd kills the sleep and the reboot is never run, so we see a system that has come up but never gone down.

I found this by entering a machine in the bad state and looking at the journal. In future I suggest that systemd's autopkgtests output the journal to their log so that this would be debuggable without admin access next time.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 234-2ubuntu6

---------------
systemd (234-2ubuntu6) artful; urgency=medium

  * Disable KillUserProcesses, yet again, with meson this time.
  * Re-enable reboot tests.

 -- Dimitri John Ledkov <email address hidden> Thu, 17 Aug 2017 15:22:35 +0100

Changed in systemd (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.