shutdown hangs at "Waiting for process: ..." for 90s, ignoring DefaultTimeoutStopSec

Bug #1958284 reported by Jean Raby
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Medium
Unassigned

Bug Description

[Impact]

The systemd shutdown sequence does not honor systemd-system.conf settings when waiting for remaining processes. This means that, for example, if a systemd service specifies KillMode=process and a process remaining from that service does not properly handle SIGTERM, then the remaining process will not be killed until after the compiled-in default value of DefaultTimeoutStopSec (90s), even if the user has changed the setting of DefaultTimeoutStopSec. In such cases, this impacts users by significantly increasing the time required for shutdown/reboot.

[Test Plan]

* Create a new script, /usr/local/bin/loop-ignore-sigterm:
  ```
  #!/bin/bash
  loop_forever() {
      while true; do sleep 1; done
  }

  (
  trap 'echo Ignoring SIGTERM...' SIGTERM
  loop_forever
  )

  loop_forever
  ```

  This script will spawn a subshell which will loop forever and ignore
  SIGTERM. This will force systemd to wait for the subprocess at
  reboot/shutdown, and eventually send SIGKILL after TimeoutStopSec
  (DefaultTimeoutStopSec in this case).

* Make the script executable:
  $ chmod +x /usr/local/bin/loop-ignore-sigterm

* Create a systemd service for this script. Add the following to
  /etc/systemd/system/loop-ignore-sigterm.service:
  ```
  [Service]
  KillMode=process
  ExecStart=/usr/local/bin/loop-ignore-sigterm
  ```

* Start the service:
  $ systemctl start loop-ignore-sigterm.service

* Edit /etc/systemd/system.conf, and uncomment the
 'DefaultTimeoutStopSec=90s' line. Modify 90s to something much shorter,
  e.g. 20s.

* Re-exec the daemon so this new default takes effect:
  $ systemctl daemon-reexec

* Reboot, and monitor the logs. Observe that systemd-shutdown will wait
  for the loop-ignore-sigterm process for 90s, instead of the 20s
  configured earlier.

[Where problems could occur]

The patch moves the reset_arguments() call to the end of main, which means reset_arguments() is no longer called before daemon re-execution (if that branch is taken). If anything in that code path relied on reset_arguments() being called before re-executing, those assumptions could be broken. Any such problems would potentially be seen during daemon re-execution, e.g. when calling systemctl daemon-reexec.

[ Original Description ]

With systemd v245 as shipped with 20.04, the shutdown sequence does not use the value of `DefaultTimeoutStopSec` to wait for remaining processes, it instead uses the compiled in default of 90s.

This is most visible with services that use `KillMode=process` (docker, k8s, k3s, etc...), especially if the remaining processes do not handle `SIGTERM` or choose to ignore it.

For example:
```
[ OK ] Finished Reboot.
[ OK ] Reached target Reboot.
[ 243.652848 ] systemd-shutdown[1]: Waiting for process: containerd-shim, containerd-shim, containerd-shim, fluent-bit

--- hangs here for 90s even if DefaultTimeoutStopSec is set to a lower value ---

```

The bug has been fixed upstream here: https://github.com/systemd/systemd/commit/7d9eea2bd3d4f83668c7a78754d201b22

Marc was kind enough to package the patch for 20.04 so I could test it (https://launchpad.net/~mdeslaur/+archive/ubuntu/testing/+sourcepub/13210617/+listing-archive-extra) and with that package, I can confirm that it indeed fixes the issue.

Here's a few github issues I stumbled upon while trying to debug this, along with a short writeup of the workaround I ended up using:

- https://github.com/moby/moby/issues/41831
- https://github.com/k3s-io/k3s/issues/2400
- https://github.com/systemd/systemd/issues/16991
- https://raby.sh/debugging-90s-hangs-during-shutdown-on-ubuntu-2004.html

Of course, it would be much better if all the processes would properly handle `SIGTERM`, but having a way to enforce a maximum wait time at shutdown is a decent workaround.

Given that the patch is relatively simple, would it be possible to add it the package for 20.04?

Thanks

Related branches

Jean Raby (g-jean)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Lukas Märdian (slyon)
tags: added: rls-ff-incoming
tags: added: fr-1987
tags: removed: fr-1987 rls-ff-incoming
Changed in systemd (Ubuntu Focal):
status: New → Confirmed
Lukas Märdian (slyon)
Changed in systemd (Ubuntu Focal):
importance: Undecided → Medium
Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

Any updates on this?

Revision history for this message
Lukas Märdian (slyon) wrote :

It has recently been picked up by Foundations, and we should have the capacity to start working on this next week.

Nick Rosbrook (enr0n)
description: updated
Lukas Märdian (slyon)
Changed in systemd (Ubuntu Focal):
status: Confirmed → In Progress
Changed in systemd (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Jean, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.16 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/245.4-4ubuntu3.16)

All autopkgtests for the newly accepted systemd (245.4-4ubuntu3.16) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

gvfs/1.44.1-1ubuntu1 (arm64, ppc64el, amd64)
linux-aws-5.13/5.13.0-1019.21~20.04.1 (arm64)
snapd/2.54.3+20.04.1ubuntu0.2 (arm64, ppc64el, s390x)
docker.io/20.10.7-0ubuntu5~20.04.2 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Nick Rosbrook (enr0n) wrote :

I tested systemd 245.4-4ubuntu3.16 from focal-proposed using the test plan above. I observed that the loop-ignore-sigterm.service processes were killed after ~20s on shutdown, which is what I configured in /etc/systemd/system.conf.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Nick Rosbrook (enr0n) wrote :

The autopkgtest regressions blocking systemd 245.4-4ubuntu3.16 in focal-proposed have been resolved. The regressions appear to have been related to recent autopkgtest infrastructure issues, and retrying the tests resolved the issues.

Nick Rosbrook (enr0n)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 245.4-4ubuntu3.16

---------------
systemd (245.4-4ubuntu3.16) focal; urgency=medium

  [ Dan Streetman ]
  * d/p/lp1946388-sd-journal-don-t-check-namespaces-if-we-have-no-name.patch:
    Avoid journalctl segfault (LP: #1946388)

  [ Jeremy Szu ]
  * Add a allowlist to unblock intel-hid on new HP machines (LP: #1955997)
    Author: Jeremy Szu
    File: debian/patches/lp1955997-add-a-allowlist-to-unblock-intel-hid-on-HP-mach.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=88a859eaddb6c9a611fcbc44edab441aef4c4355

  [ Nick Rosbrook ]
  * Prevent arguments from being overwritten with defaults at shutdown (LP: #1958284)
    File: debian/patches/lp1958284-core-move-reset_arguments-to-the-end-of-main-s-finish.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e61052bd1f20bcc54e7417542c6d445cf5040f56

  [ Lukas Märdian ]
  * Fix deadlock between pid1 and dbus-daemon (LP: #1871538)
    Author: Lukas Märdian
    File: debian/patches/pid1-set-SYSTEMD_NSS_DYNAMIC_BYPASS-1-env-var-for-dbus-da.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e3aacfa26e3fc6df369e6f28e740389ae0020907

 -- Nick Rosbrook <email address hidden> Wed, 23 Mar 2022 09:29:33 -0400

Changed in systemd (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers