Reloading services can result in a deadlock under systemd
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | samba (Debian) |
Fix Released
|
Unknown
|
||
| | systemd (Debian) |
Fix Released
|
Unknown
|
||
| | systemd (Ubuntu) |
High
|
Martin Pitt | ||
| | sysvinit (Debian) |
Fix Released
|
Unknown
|
||
| | sysvinit (Ubuntu) |
Medium
|
Martin Pitt | ||
Bug Description
since today (although haven't rebooted in a few days), systemd init is hanging trying to bring up ifup@eth0. I had to revert to upstart boot to get in, although sometimes it would successfully boot with systemd.debug-shell
JOB UNIT TYPE STATE
120 winbind.service start waiting
175 nmbd.service start waiting
159 libvirt-bin.service start waiting
151 dns-clean.service start waiting
116 vmware-
177 smbd.service start waiting
178 fail2ban.service start waiting
331 rpcbind.service start waiting
109 apache2.service start waiting
124 openvpn.service start waiting
184 x-display-
173 vmware.service start waiting
143 <email address hidden> start waiting
141 samba-ad-dc.service start waiting
2 multi-user.target start waiting
332 rpcbind.target start waiting
108 hddtemp.service start waiting
172 plymouth-
152 autofs.service start waiting
123 schroot.service start waiting
130 mail-transport-
129 postfix.service start waiting
128 isc-dhcp-
131 mysql.service start waiting
142 getty.target start waiting
164 sabnzbdplus.service start waiting
1 graphical.target start waiting
176 apt-cacher-
183 gdm.service start waiting
174 ssh.service start waiting
77 network-
112 systemd-
133 urfkill.service start waiting
111 kerneloops.service start waiting
161 rc-local.service start waiting
106 lxc.service start waiting
107 lxc-net.service start waiting
155 virtualbox.service start waiting
191 <email address hidden> start running
50 network.target start waiting
40 jobs listed.
* <email address hidden> - ifup for eth0
Loaded: loaded (/lib/systemd/
Active: activating (start-post) since Mon 2015-02-02 19:17:25 AEDT; 9min ago
Process: 800 ExecStart=
Main PID: 800 (code=exited, status=0/SUCCESS); : 803 (ifup)
CGroup: /<email address hidden>
|- 803 /sbin/ifup --allow=auto eth0
|- 832 /bin/sh -c dhclient -1 -v -pf /run/dhclient.
|- 833 dhclient -1 -v -pf /run/dhclient.
|-1269 /bin/sh /sbin/dhclient-
|-1273 /bin/sh /usr/sbin/
`-1294 systemctl reload smbd.service
Feb 02 19:17:25 duhast dhclient[833]: Sending on Socket/fallback
Feb 02 19:17:25 duhast dhclient[833]: DHCPREQUEST of 192.168.15.67 on eth0 to 255.255.255.255 port 67 (xid=0x502572c4)
Feb 02 19:17:25 duhast ifup[803]: Listening on LPF/eth0/
Feb 02 19:17:25 duhast ifup[803]: Sending on LPF/eth0/
Feb 02 19:17:25 duhast ifup[803]: Sending on Socket/fallback
Feb 02 19:17:25 duhast ifup[803]: DHCPREQUEST of 192.168.15.67 on eth0 to 255.255.255.255 port 67 (xid=0x502572c4)
Feb 02 19:17:28 duhast dhclient[833]: DHCPREQUEST of 192.168.15.67 on eth0 to 255.255.255.255 port 67 (xid=0x502572c4)
Feb 02 19:17:28 duhast ifup[803]: DHCPREQUEST of 192.168.15.67 on eth0 to 255.255.255.255 port 67 (xid=0x502572c4)
Feb 02 19:17:28 duhast dhclient[833]: DHCPACK of 192.168.15.67 from 192.168.15.1
Feb 02 19:17:28 duhast ifup[803]: DHCPACK of 192.168.15.67 from 192.168.15.1
ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: systemd 218-6ubuntu1
ProcVersionSign
Uname: Linux 3.18.0-12-generic x86_64
NonfreeKernelMo
ApportVersion: 2.15.1-0ubuntu4
Architecture: amd64
CurrentDesktop: GNOME
Date: Mon Feb 2 19:47:21 2015
InstallationDate: Installed on 2012-09-23 (862 days ago)
InstallationMedia: Ubuntu GNOME Remix 12.10 "Quantal Quetzal" - Alpha amd64(20120922)
MachineType: Gigabyte Technology Co., Ltd. P67A-UD3R-B3
ProcKernelCmdLine: BOOT_IMAGE=
SourcePackage: systemd
SystemdDelta:
[EXTENDED] /lib/systemd/
1 overridden configuration files found.
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/21/2012
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F9
dmi.board.name: P67A-UD3R-B3
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSof
dmi.product.name: P67A-UD3R-B3
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
Related branches
| Tim (darkxst) wrote : | #1 |
| Martin Pitt (pitti) wrote : | #2 |
| Changed in systemd (Ubuntu): | |
| importance: | Undecided → High |
| status: | New → Triaged |
| Tim (darkxst) wrote : | #3 |
journalctl debug logs from hang
| Martin Pitt (pitti) wrote : | #4 |
Keeping notes:
> 50 network.target start waiting
This is another bug, because ifup@.service has Before=
| Martin Pitt (pitti) wrote : | #5 |
I was able to fudge a similar situation in a VM. Could you please try and edit /etc/dhcp/
For reference: While smbd is not yet started (as it waits for $network) this will fail with code 1 as the unit can't be reloaded yet.
| Martin Pitt (pitti) wrote : | #6 |
Further explaining: ... and if the DHCP enter hook fails, then the whole DHCP request fails, this ifup@eth0 never succeeds.
| Tim (darkxst) wrote : | #7 |
Martin, that doesn't work either since the invoke-rc.d call never actually returns
| Martin Pitt (pitti) wrote : | #8 |
> since the invoke-rc.d call never actually returns
Incidentally this is being discussed upstream at http://
Can you please try to patch /usr/sbin/
This isn't a complete or completely correct fix, but confirming that it works would be appreciated.
Thanks!
| Martin Pitt (pitti) wrote : | #9 |
The samba DHCP enter hook needs to be fixed too, so adding a task.
| Changed in samba (Ubuntu): | |
| assignee: | nobody → Martin Pitt (pitti) |
| importance: | Undecided → Medium |
| status: | New → In Progress |
| Martin Pitt (pitti) wrote : | #10 |
Keeping notes: This is a well-known property/problem of systemd job transactions. In its simplest form it looks like this:
# cat /etc/systemd/
[Unit]
Description=some foo service
Requires=
After=dep.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/echo foo is running
ExecReload=
# cat /etc/systemd/
[Unit]
Description=
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=
ExecStart=/bin/echo dep is running
ExecStartPost=
# systemctl daemon-reload; systemctl start foo.service
→ hangs forever, and systemctl list-jobs shows:
JOB UNIT TYPE STATE
886 dep.service start running
818 foo.service start waiting
In this form it's obviously bogus (nobody would write a unit like this). But it becomes quite common if the "reload" happens through some indirection, like a DHCP enter/exit or an ifupdown hook.
For these situations the reload should happen with --no-block, to not wait until it's finished, but just to enqueue it. While that's easy enough if a systemd unit itself does that, our various hooks didn't take that into account when they were written, and as they usually use "invoke-rc.d" or "service" (both of which don't know about --no-block) we need a more generic solution/
I proposed some bits on the upstream ML: http://
| Martin Pitt (pitti) wrote : | #11 |
I thought I had DHCP fail yesterday on a failing hook, but I cannot reproduce this today. So the || true was apparently a red herring.
I also linked to the Debian bug which describes the same problem, and has a proposed patch. We could apply that as a short-term solution, or try to fix this in systemd properly.
| Changed in samba (Ubuntu): | |
| assignee: | Martin Pitt (pitti) → nobody |
| status: | In Progress → Triaged |
| Changed in systemd (Ubuntu): | |
| assignee: | nobody → Martin Pitt (pitti) |
| summary: |
- systemd init gets stuck reloading smbd.service + Reloading services can result in a deadlock under systemd |
| Changed in systemd (Ubuntu): | |
| status: | Triaged → In Progress |
| Changed in samba (Ubuntu): | |
| milestone: | none → ubuntu-15.04 |
| milestone: | ubuntu-15.04 → none |
| Changed in systemd (Ubuntu): | |
| milestone: | none → ubuntu-15.03 |
| Changed in samba (Debian): | |
| status: | Unknown → Fix Committed |
| Martin Pitt (pitti) wrote : | #12 |
For keeping references: This is another case like https:/
| Martin Pitt (pitti) wrote : | #13 |
I pondered this for a while; quite naturally there is no perfect solution as synchronous reload/restart requests in hooks which may be part of a systemd unit transaction are fundamentally against the design of systemd. But this patch now seems to fix this bug and the three referenced Debian bugs, and supersedes Avoid-reload-
Attaching here so that I don't lose them.
| Martin Pitt (pitti) wrote : | #14 |
This is another patch I experimented with which helps with a different kind of deadlock problem. But with the previous patch all of the practically reported issues are already covered, so I just keep it here in case we need it in the future.
| Martin Pitt (pitti) wrote : | #15 |
Tim: Can you please revert the change to "service" (apt-get install --reinstall sysvinit-utils) and test the systemd in https:/
| tags: | added: patch |
| Martin Pitt (pitti) wrote : | #16 |
Actually, please forget the PPA; after further discussions with the Debian and upstream systemd maintainers we have settled for a different solution. This now works for me with samba and postfix.
Can you please replace /usr/sbin/service with this script and confirm that it also fixes your boot hang? Thanks!
| Tim (darkxst) wrote : | #17 |
I tested the updated service scripts and indeed does also fix my boot hang!
| Martin Pitt (pitti) wrote : | #18 |
As explained in https:/
| affects: | samba (Ubuntu) → sysvinit (Ubuntu) |
| Changed in sysvinit (Ubuntu): | |
| milestone: | none → ubuntu-15.03 |
| assignee: | nobody → Martin Pitt (pitti) |
| Changed in sysvinit (Debian): | |
| status: | Unknown → New |
| Changed in samba (Debian): | |
| status: | Fix Committed → New |
| Changed in systemd (Debian): | |
| status: | Unknown → New |
| Changed in systemd (Ubuntu): | |
| status: | In Progress → Fix Committed |
| Changed in sysvinit (Ubuntu): | |
| status: | Triaged → Fix Committed |
| Changed in systemd (Debian): | |
| status: | New → Fix Committed |
| Launchpad Janitor (janitor) wrote : | #19 |
This bug was fixed in the package sysvinit - 2.88dsf-53.2ubuntu8
---------------
sysvinit (2.88dsf-
* service, invoke-rc.d: Avoid deadlocks during bootup and shutdown from
units/hooks which call "invoke-rc.d service reload" and similar, since the
synchronous wait plus systemd's normal behaviour of transactionally
processing all dependencies first easily causes dependency loops. Thus
during boot/shutdown operate only on the unit and not on its dependencies,
just like SysV behaves. (Closes: #777113, LP: #1417010)
* Make sysvinit-utils and sysv-rc break systemd << 215 to ensure we have the
"systemctl is-system-running" command.
-- Martin Pitt <email address hidden> Mon, 09 Feb 2015 11:07:10 +0100
| Changed in sysvinit (Ubuntu): | |
| status: | Fix Committed → Fix Released |
| Launchpad Janitor (janitor) wrote : | #20 |
This bug was fixed in the package systemd - 218-8ubuntu2
---------------
systemd (218-8ubuntu2) vivid; urgency=medium
* debian/
visible /sys/fs/
test to check /proc/self/cgroup instead.
-- Martin Pitt <email address hidden> Tue, 10 Feb 2015 10:37:01 +0100
| Changed in systemd (Ubuntu): | |
| status: | Fix Committed → Fix Released |
| Changed in systemd (Debian): | |
| status: | Fix Committed → Fix Released |
| Changed in sysvinit (Debian): | |
| status: | New → Fix Committed |
| Changed in sysvinit (Debian): | |
| status: | Fix Committed → Fix Released |
| Changed in samba (Debian): | |
| status: | New → Fix Released |


It seems this is a dependency loop:
(1) network- online. target waits for <email address hidden> as it's in /etc/network/ interfaces online. target
(2) <email address hidden> starts during boot, triggered by udev; through samba's smbd dhclient-enter hook it calls invoke-rc.d smbd reload
(3) /etc/init.d/smbd has "Requires: $network"
(4) the SysV generator translates $network to network-
We really want (1), (2) and (3) are justified. According to http:// refspecs. linuxbase. org/LSB_ 4.1.0/LSB- Core-generic/ LSB-Core- generic/ facilname. html, $network should be mapped to network.service, not network- online. service. If others disagree and $network should have the "strong" is-online meaning, we need to fix it at (3) instead, e. g. by adding a proper systemd unit to samba. (We should really teach socket activation to stuff..)