Race condition when starting dbus services

Bug #1896614 reported by Victor Tapia
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd
Fix Released
Unknown
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Medium
Victor Tapia

Bug Description

[impact]

In certain scenarios, such as high load environments or when "systemctl daemon-reload" runs at the same time a dbus service is starting (e.g. systemd-logind), systemd is not able to track properly when the service has started, keeping the job 'running' forever.

[test case]

set up a 1-cpu VM with Bionic, and configure the system with a ssh key so the user can ssh to localhost. Then run something like:

$ while timeout 5 ssh localhost true; do echo 'reloading'; sudo systemctl restart systemd-logind & sudo systemctl daemon-reload; done

if that doesn't work try:

$ while timeout 5 ssh localhost true; do echo 'reloading'; sudo sh -c 'systemctl restart systemd-logind & systemctl daemon-reload'; done

once the reproducer exits the while loop, there should be a running job for systemd-logind, and any logins attempted after the bug is reproduced should also hang waiting for the systemd-logind job to complete, e.g.:

ubuntu@lp1896614-b:~$ systemctl list-jobs
JOB UNIT TYPE STATE
525 systemd-logind.service start running
669 session-6.scope start waiting
664 session-5.scope start waiting

3 jobs listed.

[regression potential]

any regression would likely involve services that are Type=dbus failing to complete starting. as with any systemd change, regressions could also involve assertion failures in systemd which causes it to exit.

[scope]

this is needed only for bionic.

this is fixed upstream with commit a5a8776ae5e4244b7f5acb2a1bfbe6e0b4d8a870 which is including starting in v243, so it is included already in focal and later.

(per upstream bug) this was introduced by upstream commit 75152a4d6aedbfd3ee8b2d5782b9edf27407622a which was included starting in v237, so this bug is not present in xenial or earlier.

[original description]

In certain scenarios, such as high load environments or when "systemctl daemon-reload" runs at the same time a dbus service is starting (e.g. systemd-logind), systemd is not able to track properly when the service has started, keeping the job 'running' forever.

The issue appears when systemd runs the "AddMatch" dbus method call to track the service's "NameOwnerChange" once it has already ran. A working instance would look like this:

https://pastebin.ubuntu.com/p/868J6WBRQx/

A failing instance would be:

https://pastebin.ubuntu.com/p/HhJZ4p8dT5/

I've been able to reproduce the issue on Bionic (237-3ubuntu10.42) running:

sudo systemctl daemon-reload & sudo systemctl restart systemd-logind

Victor Tapia (vtapia)
tags: added: sts
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

restarting systemd-logind is not safe, as existing sessions can be logged out.

also performing daemon-reload, mid-boot, also is not safe.

Can you explain the usecase and why these actions are performed together, racing each other?

Revision history for this message
Victor Tapia (vtapia) wrote :

In the original report, the issue happened randomly on boot when a service[1] was triggering a reload while systemd-logind was starting, resulting in a list of queued jobs that were never executed.

The issue can happen too under high load conditions, as reported upstream:
https://github.com/systemd/systemd/issues/12956

To simplify the reproducer I went with systemd-logind+daemon-reload, but it can be done with any other dbus service.

[1]

[Unit]
Description=Disable unattended upgrades
After=network-online.target local-fs.target

[Service]
Type=oneshot
ExecStart=/bin/bash -c "/bin/chmod 644 /etc/cron.daily/apt-compat ; /bin/systemctl disable apt-daily-upgrade.timer apt-daily.timer ; /bin/systemctl stop apt-daily-upgrade.timer apt-daily.timer"

[Install]
WantedBy=multi-user.target

Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Bionic):
assignee: nobody → Victor Tapia (vtapia)
importance: Undecided → Medium
status: New → In Progress
Changed in systemd (Ubuntu):
status: New → Fix Released
Dan Streetman (ddstreet)
description: updated
Dan Streetman (ddstreet)
description: updated
Changed in systemd:
status: Unknown → Fix Released
Dan Streetman (ddstreet)
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Victor, or anyone else affected,

Accepted systemd into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/237-3ubuntu10.43 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/237-3ubuntu10.43)

All autopkgtests for the newly accepted systemd (237-3ubuntu10.43) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

linux-hwe-5.4/5.4.0-52.57~18.04.1 (armhf)
lxc/3.0.3-0ubuntu1~18.04.1 (ppc64el)
openssh/1:7.6p1-4ubuntu0.3 (s390x, arm64, armhf, ppc64el, i386, amd64)
linux-aws-5.0/unknown (amd64)
gvfs/1.36.1-0ubuntu1.3.3 (arm64)
suricata/3.2-2ubuntu3 (i386)
nftables/0.8.2-1 (amd64)
libvirt/4.0.0-1ubuntu8.17 (i386)
netplan.io/0.99-0ubuntu3~18.04.3 (amd64)
systemd/237-3ubuntu10.43 (i386)
linux-hwe-5.0/5.0.0-62.67 (arm64, armhf)
docker.io/19.03.6-0ubuntu1~18.04.2 (i386)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Victor Tapia (vtapia) wrote :

# VERIFICATION

Note: As a reminder, the issue here is that there's a race condition between any DBUS service and systemctl daemon-reload, where systemd adds the DBUS filter (AddMatch) that looks for a name change when that has already happened. I'll be using systemd-logind as the DBUS service in my reproducer.

Using the following reproducer:

for i in $(seq 1 1000); do echo $i; ssh $SERVER 'sudo systemctl daemon-reload & sudo systemctl restart systemd-logind'; done

- With systemd=237-3ubuntu10.42 (-updates), after a few runs, systemd-logind is stuck as a running job and ssh is not responsive. DBUS messages[1] show that the AddMatch filter is set by systemd after systemd-logind has acquired its final name (systemd-login1)
- With systemd=237-3ubuntu10.43 (-proposed), systemd-logind does not get stuck and everything continues to work. In a scenario[2] where the systemd DBUS AddMatch message arrives after the final systemd-logind NameOwnerChanged, systemd is able to catch up thanks to the GetNameOwner introduced in the patch

[1] https://pastebin.ubuntu.com/p/NxRNX9bwCP/
[2] https://pastebin.ubuntu.com/p/jpKpW3g2bK/

Dan Streetman (ddstreet)
tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 237-3ubuntu10.43

---------------
systemd (237-3ubuntu10.43) bionic; urgency=medium

  [ Guilherme G. Piccoli ]
  * d/p/lp1830746-bump-mlock-ulimit-to-64Mb.patch:
    - Bump the memlock limit to match Focal and newer releases (LP: #1830746)
      https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=61adb797642f3dd2e5c14f7914c2949c665cefe8

  [ Victor Manuel Tapia King ]
  * d/p/lp1896614-core-Avoid-race-when-starting-dbus-services.patch:
    - Fix race when starting dbus services (LP: #1896614)
      https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=373cb6ccd6978a7112bbfd7e5cf4f703a9f8448e

  [ Dan Streetman ]
  * d/t/*,
    d/p/lp1892358/0001-test-increase-qemu-timeout-for-TEST-08-and-TEST-09.patch,
    d/p/lp1892358/0002-test-increase-timeout-for-TEST-17-UDEV-WANTS.patch,
    d/p/lp1892358/0003-test-increase-qemu-timeout-for-TEST-18-and-TEST-19.patch:
    - Increase QEMU_TIMEOUT on 'upstream' autopkgtest tests
    - Pull latest tests from newer releases to fix false negatives
    - Blacklist flaky 'upstream' TEST-03
      (LP: #1892358)
      https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=9fd8391c2499e163515b629a8ca5790898fc599d
      https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=d1756b3e1c3e625ed7162cff4909e7a29c315051
      https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=37f8d73516a84e85e4057d6a92204b4a174af718
      https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=229ed2076eb773efc548035262b8b8009bf89207
      https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=f2d7b1f952667316cc07a4b3c5010e66ace07a90
      https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=659befe61bbfeb7afc9efa24458c9745412d7c6d

 -- Victor Manuel Tapia King <email address hidden> Wed, 07 Oct 2020 16:30:03 -0400

Changed in systemd (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.