219-1ubuntu1 regression: boot hangs, logind fails
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | systemd |
Fix Released
|
Undecided
|
Martin Pitt | |
| | systemd (Debian) |
Fix Released
|
Unknown
|
||
| | systemd (Fedora) |
Won't Fix
|
High
|
||
| | systemd (Ubuntu) |
Critical
|
Martin Pitt | ||
Bug Description
Since yesterday, systemd does not boot properly any more. It takes very long, ends up in an X failsafe session, and eventually you just get a getty on VT1.
Downgrading libpam-systemd libsystemd0 systemd systemd-sysv to https:/
This happens both with the standard "quiet splash $vt_handoff" as well as without these tree options, i. e. text mode boot.
$ sudo systemctl list-jobs
JOB UNIT TYPE STATE
1634 sound.target stop waiting
1176 NetworkManager.
1598 cgproxy.service start waiting
1632 failsafe-
1563 alsa-restore.
1594 plymouth-
1588 getty-static.
1562 alsa-state.service start waiting
121 systemd-
1640 <email address hidden> stop waiting
1544 systemd-
1507 friendly-
1607 anacron.service start waiting
1489 systemd-
1637 system-
1636 <email address hidden> stop waiting
1631 failsafe-x.service stop waiting
1638 systemd-
1610 pppd-dns.service start waiting
1483 systemd-
1635 system-
1557 systemd-
1630 systemd-
1500 sys-kernel-
92 multi-user.target start waiting
1509 debian-
1641 <email address hidden> stop waiting
1613 plymouth-
1639 system-ifup.slice stop waiting
1642 <email address hidden> stop waiting
1633 acpid.service stop waiting
1643 systemd-
1543 plymouth-
1499 plymouth-
Attaching debug journal.
ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: systemd 219-1ubuntu1
ProcVersionSign
Uname: Linux 3.18.0-13-generic x86_64
ApportVersion: 2.16.1-0ubuntu2
Architecture: amd64
CurrentDesktop: Unity
Date: Fri Feb 20 07:39:03 2015
EcryptfsInUse: Yes
InstallationDate: Installed on 2014-11-20 (91 days ago)
InstallationMedia: Ubuntu 15.04 "Vivid Vervet" - Alpha amd64 (20141119)
MachineType: LENOVO 2324CTO
ProcKernelCmdLine: BOOT_IMAGE=
SourcePackage: systemd
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/09/2013
dmi.bios.vendor: LENOVO
dmi.bios.version: G2ET95WW (2.55 )
dmi.board.
dmi.board.name: 2324CTO
dmi.board.vendor: LENOVO
dmi.board.version: 0B98401 Pro
dmi.chassis.
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.
dmi.modalias: dmi:bvnLENOVO:
dmi.product.name: 2324CTO
dmi.product.
dmi.sys.vendor: LENOVO
| Martin Pitt (pitti) wrote : | #1 |
| Changed in systemd (Ubuntu): | |
| assignee: | nobody → Martin Pitt (pitti) |
| importance: | Undecided → Critical |
| status: | New → In Progress |
| milestone: | none → ubuntu-15.02 |
| description: | updated |
| description: | updated |
| Martin Pitt (pitti) wrote : | #2 |
| Martin Pitt (pitti) wrote : | #3 |
I can reproduce this quite reliably with:
sudo rm -r /var/log/journal
sudo mkdir /var/log/journal
sudo apt-get install --reinstall systemd
I now managed to reproduce the hang in a VM as well, with these commands.
| Martin Pitt (pitti) wrote : | #4 |
I continued to bisect binaries (/lib/systemd/
I never got a hang with 219 + journald from 218, and I did get a hang with 218 + journald from 219, which indicates that this is related to journald. Didier confirmed the latter hang.
| Martin Pitt (pitti) wrote : | #5 |
This is an automated reproducer in the form of an autopkgtest. You can call it like this:
adt-run --built-tree ./systemd-bootsmoke --- qemu /srv/vm/
This reboots the VM up to 20 times, and check for stuck jobs or non-running polkit on each iteration. You can also test with modified binaries by copying them into the testbed's /tmp/systemd-
adt-run --built-tree ./systemd-bootsmoke --copy /tmp/systemd-
With e. g. systemd-journald from 218 the test passes. Note that there is sometimes some testbed timeout and other testbed setup failures, due to various race conditions.
I added a few robustifications to autopkgtest 3.9.7 (just uploaded to Debian sid, will sync into vivid this evening). You can of course also just checkout latest autopkgtest git and call /checkout/
In principle this works with a standard adt-buildvm-
- Prepare the VM for autopkgtest (enable serial console and provide the root shell on ttyS1 mostly) by running /usr/share/
- Install systemd-sysv
With such an image I usually get the hang after 2 or 3 reboots already, so that the 20 iterations that the test does should be quite sufficient.
I'll add this test to the systemd package soon, but for now this is standalone for easier hacking.
| Martin Pitt (pitti) wrote : | #6 |
I'm running "git bisect start v219 v218 -- src/shared src/journal" on upstream trunk now (some 220 commits to test), using the above autopkgtest. I don't use git bisect run, as the autopkgtest sometimes fails with exit code 16 (testbed setup); in those cases it should be re-run to either succeed, or "properly" fail the text (exit code 4).
| Martin Pitt (pitti) wrote : | #7 |
Bisect run finished. The culprit is
http://
This reverts cleanly against master, and running 219 with journald with that commit reverted runs stable.
| Martin Pitt (pitti) wrote : | #8 |
This is an improved git bisect script which is now robust against transient testbed failures and "make" build failures. This is now suitable for a fully automated "git bisect run ./systemd-
This is now mostly of historical interest of course, but it'll provide a nice starting point for the next regression.
| Martin Pitt (pitti) wrote : | #9 |
autopkgtest committed: http://
| Changed in systemd (Ubuntu): | |
| status: | In Progress → Fix Committed |
| Changed in systemd (Debian): | |
| status: | Unknown → Confirmed |
| Launchpad Janitor (janitor) wrote : | #10 |
This bug was fixed in the package systemd - 219-3ubuntu1
---------------
systemd (219-3ubuntu1) vivid; urgency=medium
* Merge with Debian experimental. Remaining Ubuntu changes:
- Hack to support system-image read-only /etc, and modify files in
/
- Keep our much simpler udev maintainer scripts (all platforms must
support udev, no debconf).
- initramfs init-top: Drop $ROOTDELAY, we do that in a more sensible way
with wait-for-root. Will get applicable to Debian once Debian gets
wait-for-root in initramfs-tools.
- initramfs init-bottom: If LVM is installed, settle udev,
otherwise we get missing LV symlinks. Workaround for LP #1185394.
- Add debian/
dependencies to "lvm2" which is handled with udev rules in Ubuntu.
- Provide shutdown fallback for upstart. (LP: #1370329)
- debian/
really support "allow-hotplug" in Ubuntu at the moment, so we need to
deal with "auto" devices appearing after "/etc/init.
already ran. (LP: #1374521) Also, check if devices are actually defined
in /etc/network/
- ifup@.service: Drop dependency on networking.service (i. e.
/
This avoids unnecessary dependencies/
cycles if hooks wait for other interfaces to come up (like ifenslave
with bonding interfaces). (LP: #1414544)
- Add Get-RTC-
Ubuntu we currently keep the setting whether the RTC is in local or UTC
time in /etc/default/rcS "UTC=yes|no", instead of /etc/adjtime.
(LP: #1377258)
- Put session scopes into all cgroup controllers. This makes unprivileged
user LXC containers work under systemd. (LP: #1346734)
- Lower Breaks: to plymouth version which has the udev inotify fix in
Ubuntu.
- Lower libappamor1 dep to the Ubuntu version where it moved to /lib.
- Make failure of boot-and-services NSpawn.test_boot non-fatal for now.
This currently fails when being triggered by Jenkins, but is totally
unreprodu
Upgrade fixes, keep until 16.04 LTS release:
- systemd Conflicts/
- Remove obsolete systemd-logind upstart job.
- Clean up obsolete /etc/udev/
systemd (219-3) experimental; urgency=medium
* sysv-generator: fix wrong "Overwriting existing symlink" warnings.
(Closes: #778700)
* Add systemd-fsckd multiplexer and feed its output to plymouth. This
provides an aggregate progress report of running file system checks and
also allows cancelling them with ^C, in both text mode and Plymouth.
(Closes: #775093, #758902; LP: #1316796)
* Revert "journald: allow restarting journald without losing stream
connections". This was a new feature in 219, but currently causes boot
failures due to logind an...
| Changed in systemd (Ubuntu): | |
| status: | Fix Committed → Fix Released |
| Changed in systemd (Debian): | |
| status: | Confirmed → Fix Released |
| Changed in systemd: | |
| assignee: | nobody → Martin Pitt (pitti) |
| Martin Pitt (pitti) wrote : | #11 |
I'm getting a similar (perhaps/hopefully the same) hang if I do this:
systemctl mask --runtime systemd-journald systemd-logind; systemctl stop systemd-journald systemd-logind
SYSTEMD_
# restart 5 times
SYSTEMD_
# restart a few times
After a few restarts, logind fails to start with the 25s D-Bus timeout and the error message
Failed to add match for NameOwnerChanged: Connection timed out
Failed to fully start up daemon: Connection timed out
From then on it keeps failing. This can be reset by kill -9 the system dbus-daemon. systemd will auto-restart it, and then logind can start again.
| Martin Pitt (pitti) wrote : | #12 |
I don't get the hang with the reproducer in comment 11 on Fedora 21 with systemd 219 and dbus 1.8.16 from rawhide.
|
|
#19 |
Created attachment 1001544
failure log
Since around March 6th, my Fedora 22 desktop system has frequently failed to boot properly.
Unfortunately the last time I booted the system before that was February 23rd, so there's quite a big list of possible changes in there.
I have tried downgrading from systemd-219-8.fc22 to systemd-219-5.fc22 and from kernel-
When the boot fails, various services don't attempt to start for some time, and when they try and start, many fail. The system never manages to start any consoles or gdm.
From the debug console (tty9), running 'systemctl' appears to do nothing for a long time, then shows an error like "Failed to register match for disconnected message".
I will attach logs of both failed and successful boots with systemd.log_level = debug.
The boot fails on average ~3 in 4 times.
|
|
#20 |
Created attachment 1001545
success log
In the upcoming systemd update there's a fix for socket path handling. I'm not sure if this could be the same issue, but it'd be good if you could test if it helps.
|
|
#22 |
systemd-219-9.fc22 has been submitted as an update for Fedora 22.
https:/
|
|
#23 |
Nope, sorry, doesn't help :/ saw the bug on 2 of 3 test boots with 219-9.
|
|
#24 |
Package systemd-219-9.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=
as soon as you are able to.
Please go to the following url:
https:/
then log in and leave karma (feedback).
|
|
#25 |
This is definitely a bug in 219. I have the same issue with my rhel build.
Unfortunately on my machine it occurs quite rarely so its hard to debug.
|
|
#26 |
I see similar problem, my case is described in bug 1204023.
|
|
#27 |
Lukas: it occurs frequently for me, I'm happy to do any debugging anyone requests, but I'd really like to see this fixed. It's kind of unfortunate that my system neither starts up nor shuts down properly most of the time! (I suffer from one or other of the bugs in the disaster zone that is https:/
|
|
#28 |
Just guessing... This could be the same problem:
http://
Martin Pitt is reverting commit 13790add4bf648f
|
|
#29 |
I just tested with systemd-219-11.fc22 and 8 boots in a row didn't hit the bug once. It may be a bit early to declare 'mission accomplished', but it looks good.
zbyszek, any chance you could submit an update for 219-11?
| Adam Williamson (awilliamson) wrote : | #13 |
FWIW I was seeing something that sounds very similar on F22:
https:/
but it *seems* to have gone away with systemd-
219-9.fc22 was definitely broken for me; that build had 'v219-stable' as of commit 4acdc3835b2c9d3
| Martin Pitt (pitti) wrote : | #14 |
I just applied all v219-stable patches on top of our package, and dropped the reverted patch. I now get the boot failures again, so I'm afraid that wasn't it.
|
|
#30 |
I just applied all v219-stable patches on top of our package, and re-applied 13790a again (i. e. dropped the revert). I now get the boot failures again, so I'm afraid the patches in v219-stable aren't sufficient.
|
|
#31 |
Hum, interesting. So either I got really lucky yesterday, or we're actually seeing two different bugs, or (I suppose) latest stable changes things enough to hide whatever's triggering the bug when I boot my system, but not to fix your reproducer...
| Martin Pitt (pitti) wrote : | #15 |
Also tracked in Arch Linux: https:/
| Martin Pitt (pitti) wrote : | #16 |
I cannot reproduce this with current upstream git master any more, so I ran another bisect. Apparently http://
http://
http://
Now the boot-smoke test still succeeds with that package.
| Changed in systemd: | |
| status: | New → Fix Released |
|
|
#32 |
Good news! It seems this got finally fixed (involuntarily) in upstream git master, and the patch cleanly backports to 219. See
http://
for the details.
|
|
#34 |
Guys,
Is this already fixed in F22?
I came here via a search on 'fedora 22 avahi-daemon fails'.
avahi-daemon fails to start on one of my servers, but not on others
|
|
#35 |
(In reply to Ferry Huberts from comment #15)
> Guys,
> Is this already fixed in F22?
>
> I came here via a search on 'fedora 22 avahi-daemon fails'.
> avahi-daemon fails to start on one of my servers, but not on others
Nevermind, this was an install failure for the avahi package: the avahi user was not created.
I've seen this with the tcpdump packages as well.
I think there might be some rpm/dnf issue there.
| quequotion (quequotion) wrote : | #17 |
I am very, very painfully going through this process right now.
I REALLY need dpkg to stop trying to remove colord and network-manager (dependent on policykit-1). It would also help if Ubuntu Web Browser were a little more responsive.. Took 10 minutes to type this far.... probably unrelated...
Is the reboot absolutely necessary? I don't think my system will come back.
| eiro (eiro1980) wrote : | #18 |
I'm on Ubuntu 15.10 and still have the same problem:
Feb 10 09:07:09 DELL-PC systemd-
Feb 10 09:07:09 DELL-PC systemd-
Feb 10 09:07:35 DELL-PC dbus[729]: [system] Failed to activate service 'org.freedeskto
Feb 10 09:07:35 DELL-PC dbus[729]: [system] Failed to activate service 'org.freedeskto
Feb 10 09:07:35 DELL-PC systemd-
Feb 10 09:07:35 DELL-PC systemd-
Feb 10 09:07:36 DELL-PC dbus[729]: [system] Failed to activate service 'org.freedeskto
Feb 10 09:07:36 DELL-PC dbus[729]: [system] Failed to activate service 'org.freedeskto
Feb 10 09:07:45 DELL-PC dbus[729]: [system] Failed to activate service 'fi.w1.
Feb 10 09:08:10 DELL-PC dbus[729]: [system] Failed to activate service 'fi.w1.
Feb 10 09:08:10 DELL-PC dbus[729]: [system] Failed to activate service 'org.freedeskto
Feb 10 09:10:47 DELL-PC dbus[765]: [system] Failed to activate service 'org.freedeskto
Feb 10 09:10:47 DELL-PC systemd-
Feb 10 09:10:47 DELL-PC dbus[765]: [system] Failed to activate service 'org.freedeskto
Feb 10 09:10:47 DELL-PC systemd-
Feb 10 09:10:47 DELL-PC dbus[765]: [system] Failed to activate service 'org.freedeskto
Feb 10 09:10:48 DELL-PC dbus[765]: [system] Failed to activate service 'org.freedeskto
Feb 10 09:10:56 DELL-PC dbus[765]: [system] Failed to activate service 'fi.w1.
Feb 10 09:11:21 DELL-PC dbus[765]: [system] Failed to activate service 'fi.w1.
Feb 10 09:11:21 DELL-PC dbus[765]: [system] Failed to activate service 'org.freedeskto
|
|
#36 |
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.
Thank you for reporting this bug and we are sorry it could not be fixed.
| Changed in systemd (Fedora): | |
| importance: | Unknown → High |
| status: | Unknown → Won't Fix |


Notekeeping:
- installing self-built .debs: boot ok
- reinstalling ubuntu debs: boot fail
- Removing persistant journal (sudo rm -r /var/log/journal) with the ubuntu binaries: boot ok
- rebooting a few times with re-enabling persistant journal: boot ok
- QEMU with enabling persistant journal and booting a few times: bootok
- Didier has no persistant journal and got the issue once, so this only appears to change timing, not a fundamental thing
So now I need to figure out how to make boot fail again.