systemd-logind leaves leftover sessions and scope files

Bug #1846787 reported by Heitor Alves de Siqueira on 2019-10-04
28
This bug affects 2 people
Affects Status Importance Assigned to Milestone
dbus (Ubuntu)
Medium
Unassigned
Xenial
Medium
Heitor Alves de Siqueira
systemd (Ubuntu)
Medium
Unassigned
Xenial
Medium
Heitor Alves de Siqueira

Bug Description

[Impact]
Scope file leakage can cause SSH delays and reduce performance in systemd

[Description]
The current systemd-logind version present in Xenial can leave abandoned SSH
sessions and scope files in cases where the host sees a lot of concurrent SSH
connections. These leftover sessions can slow down systemd performance
greatly, and can have an impact on sshd handling a great number of concurrent
connections.

To fix this issue, patches are needed in both dbus and systemd. These improve the
performance of the communication between dbus and systemd, so that they can
handle a better volume of events (e.g. SSH logins). All of those patches are
already present from Bionic onwards, so we only need those fixes for Xenial.

== Systemd ==
Upstream patches:
- core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification (d8fdc62037b5)

$ git describe --contains d8fdc62037b5
v230~71^2~2

$ rmadison systemd
 systemd | 229-4ubuntu4 | xenial | source, ...
 systemd | 229-4ubuntu21.21 | xenial-security | source, ...
 systemd | 229-4ubuntu21.22 | xenial-updates | source, ... <--------
 systemd | 237-3ubuntu10 | bionic | source, ...
 systemd | 237-3ubuntu10.29 | bionic-security | source, ...
 systemd | 237-3ubuntu10.29 | bionic-updates | source, ...
 systemd | 237-3ubuntu10.31 | bionic-proposed | source, ...

== DBus ==
Upstream patches:
- Only read one message at a time if there are fds pending (892f084eeda0)
- bus: Fix timeout restarts (529600397bca)
- DBusMainLoop: ensure all required timeouts are restarted (446b0d9ac75a)

$ git describe --contains 892f084eeda0 529600397bca 446b0d9ac75a
dbus-1.11.10~44
dbus-1.11.10~45
dbus-1.11.16~2

$ rmadison dbus
 dbus | 1.10.6-1ubuntu3 | xenial | source, ...
 dbus | 1.10.6-1ubuntu3.4 | xenial-security | source, ...
 dbus | 1.10.6-1ubuntu3.4 | xenial-updates | source, ... <--------
 dbus | 1.12.2-1ubuntu1 | bionic | source, ...
 dbus | 1.12.2-1ubuntu1.1 | bionic-security | source, ...
 dbus | 1.12.2-1ubuntu1.1 | bionic-updates | source, ...

[Test Case]
1) Simulate a lot of concurrent SSH connections with e.g. a for loop:
multipass@xenial-logind:~$ for i in {1..1000}; do sleep 0.1; ssh localhost sleep 1 & done

2) Check for leaked sessions in /run/systemd/system/:
multipass@xenial-logind:~$ ls -ld /run/systemd/system/session-*.scope*
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-103.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-104.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-105.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-106.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-110.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-111.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-112.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-113.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-114.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-115.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-116.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-117.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-118.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-119.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-120.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-121.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-122.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-123.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-126.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-131.scope.d
drwxr-xr-x 2 root root 160 Oct 4 15:34 /run/systemd/system/session-134.scope.d
...

[Regression Potential]
As the patches change the communication socket between dbus and systemd, possible regressions could cause systemd to not be notified of dbus events and vice-versa. We could see units not getting started properly, and communication between different services break down (e.g. between systemd-logind and other processes).

In this case, the regression potential should be low as these patches have seen extensive testing both upstream and in more recent releases of Ubuntu. Nonetheless, these new packages will be rigorously tested through autopkgtest to avoid any possible Xenial-specific regressions.

Changed in dbus (Ubuntu):
status: New → Fix Released
Changed in systemd (Ubuntu):
status: New → Fix Released
Changed in dbus (Ubuntu Xenial):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in systemd (Ubuntu Xenial):
assignee: nobody → Heitor Alves de Siqueira (halves)
description: updated
Changed in dbus (Ubuntu Xenial):
status: New → In Progress
Changed in systemd (Ubuntu Xenial):
status: New → In Progress
description: updated
description: updated
tags: added: sts-sponsor
description: updated
Dan Streetman (ddstreet) on 2019-10-07
Changed in systemd (Ubuntu Xenial):
importance: Undecided → Medium
Changed in systemd (Ubuntu):
importance: Undecided → Medium
Changed in dbus (Ubuntu):
importance: Undecided → Medium
Changed in dbus (Ubuntu Xenial):
importance: Undecided → Medium
tags: added: ddstreet sts-sponsor-ddstreet systemd xenial
Dan Streetman (ddstreet) wrote :

The patches are on the medium-to-large size, but I have reviewed them and as far as I can tell they appear correct. The systemd patch is needed to fix the cgroup-agent from overrunning the dbus socket connection queue, and the dbus patches are needed to prevent a highly loaded dbus message queue from timing out a long queue of incoming messages.

Dan Streetman (ddstreet) wrote :

uploaded dbus and systemd to xenial queue, thanks!

Robie Basak (racb) wrote :

This looks good, thanks!

I think it would be appropriate for a second developer to review the patches for any backport-related issues. As you say the patches are non-trivial. I asked Dimitri to take a look, as he's familiar with the code base and will probably be far quicker at reviewing them than I can.

Timo Aaltonen (tjaalton) wrote :

no review happened yet, a month later..

Steve Langasek (vorlon) wrote :

Marking incomplete based on the fact that Robie has asked for additional review.

Changed in systemd (Ubuntu Xenial):
status: In Progress → Incomplete
Brian Murray (brian-murray) wrote :

Balint could you have a look this patch as a substiture for Dimitri? Thanks!

Balint Reczey (rbalint) wrote :

@brian-murray @racb I've checked the patch. This is a minimally modified cherry-pick from upstream, it is sane and there are no known regressions caused by it. I'm +1 on accepting it.

@halves thanks for the patch!

Dan Streetman (ddstreet) on 2019-11-26
Changed in systemd (Ubuntu Xenial):
status: Incomplete → In Progress

Hello Heitor, or anyone else affected,

Accepted systemd into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/229-4ubuntu21.23 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial

All autopkgtests for the newly accepted systemd (229-4ubuntu21.23) for xenial have finished running.
The following regressions have been reported in tests triggered by the package:

umockdev/0.8.11-2 (i386)
apt/1.2.32 (ppc64el)
unity8/8.12+16.04.20160401-0ubuntu1 (i386)
udisks2/2.1.7-1ubuntu1 (amd64)
gvfs/1.28.2-1ubuntu1~16.04.3 (s390x)
systemd/229-4ubuntu21.23 (i386)
docker.io/18.09.7-0ubuntu1~16.04.5 (i386, arm64, s390x, ppc64el, amd64)
nplan/0.32~16.04.7 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/xenial/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Hello Heitor, or anyone else affected,

Accepted dbus into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/dbus/1.10.6-1ubuntu3.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in dbus (Ubuntu Xenial):
status: In Progress → Fix Committed

All autopkgtests for the newly accepted dbus (1.10.6-1ubuntu3.5) for xenial have finished running.
The following regressions have been reported in tests triggered by the package:

libnih/blacklisted (armhf)
libreoffice/1:5.1.6~rc2-0ubuntu1~xenial10 (i386)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/xenial/update_excuses.html#dbus

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Summary of autopkg regressions

systemd:
- most tests passed on retest (thanks ddstreet)
- still failing: gvfs/s390x, docker.io/all archs but armhf, nplan/amd64

dbus:
- no new failures.

Looking at the pending failures/regressions.

systemd
--

umockdev/0.8.11-2 (i386)
 passed on retest.
apt/1.2.32 (ppc64el)
 passed on retest.
unity8/8.12+16.04.20160401-0ubuntu1 (i386)
 passed on retest.
udisks2/2.1.7-1ubuntu1 (amd64)
 passed on retest.
gvfs/1.28.2-1ubuntu1~16.04.3 (s390x)
 still failing.
systemd/229-4ubuntu21.23 (i386)
 passed on retest.
docker.io/18.09.7-0ubuntu1~16.04.5 (i386, arm64, s390x, ppc64el, amd64)
 still failing.
nplan/0.32~16.04.7 (amd64)
 still failing.

dbus
--

libnih/blacklisted (armhf)
 package not relevant?
libreoffice/1:5.1.6~rc2-0ubuntu1~xenial10 (i386)
 fails since 2017-07-06.

The autopkgtests regression on docker.io are unrelated to this change.

The failure is on debian/tests/basic-smoke; it happens because 'debootstrap stable <debian>' fails the gpg verification of the Release file -- the keys used for the Buster stable release are not found in Xenial's debian-archive-keyring.

This only happens on Xenial; on Bionic and later the debian-archive-keyring is sufficiently updated.

The workaround is to just patch docker.io/xenial to use 'debootstrap --no-gpg-check' (the debian debootstrap image is only used to run 'true' in a container).

The proper solution is a bit more involved on debian-archive-keyring; discussing this w/ cjwatson.

I'm not sure the proper solution is actually required on this case -- or even in the general case, as we haven't had bug reports yet about this since the Buster release in early July.

So we can probably just go with the workaround, but it's a non-runtime affecting change (build/test-time only), so would have to piggyback on another SRU to docker.io anyway, i.e., it won't make it to the archive just to fix this autopkgtest regression.

Thus, we should probably just ignore the docker.io regressions -- I'll test those w/ PPA build w/ the workaround above.

The remaining autopkgtests regressions (nplan/amd64 and gvfs/s390x) are also unrelated to this change.

- nplan/amd64 passed with 3 retests:

Before:

  test_mix_bridge_on_bond (__main__.TestNetworkManager) ... FAIL
  ...
  integration.py FAIL non-zero exit status 1

After:

  test_mix_bridge_on_bond (__main__.TestNetworkManager) ... ok
  ...
  integration.py PASS

- gvfs/s390x has a long history of flaky/likely failures on these 2 tests, as seen in

  https://autopkgtest.ubuntu.com/packages/gvfs/xenial/s390x

So it is OK to ignore this one, it's not a regression from _this_ change.

Examples:

1.28.2-1ubuntu1~16.04.3 systemd/229-4ubuntu21.23 2019-12-02 13:31:15 UTC 0h 41m 38s mfo fail

  trash:// deletion, attributes, restoring for a file in $HOME (API) ... FAIL
  trash:// deletion, attributes, restoring for a file in $HOME (CLI) ... FAIL

1.28.2-1ubuntu1~16.04.3 systemd/229-4ubuntu21.23 2019-11-27 22:19:05 UTC 0h 02m 29s ddstreet fail

  trash:// deletion, attributes, restoring for a file in $HOME (API) ... ok
  trash:// deletion, attributes, restoring for a file in $HOME (CLI) ... FAIL

with a previous, anecdotal entry of 6 retests until passing on:

1.28.2-1ubuntu1~16.04.2 apache2/2.4.18-2ubuntu3.9 2018-06-29 18:31:14 UTC 0h
02m 47s ahasenack pass

  trash:// deletion, attributes, restoring for a file in $HOME (API) ... ok
  trash:// deletion, attributes, restoring for a file in $HOME (CLI) ... ok

Verification done on xenial-proposed.

With the new systemd and dbus packages, there are no leaked sessions after the test-case of ssh loop.

The autopkgtests regressions reported previously are unrelated to this change (comments #11 to #16).

cheers,
Mauricio

Setup
---

$ sudo snap install --beta --classic multipass

$ multipass launch --cpus 2 --mem 8G --disk 8G --name lp1846787 xenial
$ multipass shell lp1846787

$ lsb_release -cs
xenial

$ ssh-keygen -t rsa -f ~/.ssh/id_rsa -N ''
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ sudo apt update && sudo apt -y upgrade && sudo reboot

xenial-updates: there are leaked sessions after ssh loop. (bad)
---

$ multipass shell lp1846787

$ dpkg -s systemd dbus | grep ^Version:
Version: 229-4ubuntu21.22
Version: 1.10.6-1ubuntu3.4

$ find /run/systemd/system/ -name 'session-*.scope.d' | wc -l
1

$ find /run/systemd/system/ -name '*.scope.d'
/run/systemd/system/session-1.scope.d

$ for i in {1..1000}; do sleep 0.01; ssh localhost sleep 1 & done
...
[1000] 12191

$ jobs
$

$ find /run/systemd/system/ -name 'session-*.scope.d' | wc -l
32

$ find /run/systemd/system/ -name 'session-*.scope.d'
/run/systemd/system/session-906.scope.d
/run/systemd/system/session-896.scope.d
/run/systemd/system/session-848.scope.d
...
/run/systemd/system/session-1.scope.d

xenial-proposed: there are NO leaked sessions after ssh loop. (good; tested 3x)
---

$ echo 'deb http://archive.ubuntu.com/ubuntu xenial-proposed main' | sudo tee /etc/apt/sources.list.d/xenial-proposed.list
$ sudo apt update && sudo apt -y install systemd dbus && sudo reboot

$ multipass shell lp1846787

$ dpkg -s systemd dbus | grep ^Version:
Version: 229-4ubuntu21.23
Version: 1.10.6-1ubuntu3.5

$ find /run/systemd/system/ -name 'session-*.scope.d' | wc -l
1

$ find /run/systemd/system/ -name 'session-*.scope.d'
/run/systemd/system/session-1.scope.d

$ for i in {1..1000}; do sleep 0.01; ssh localhost sleep 1 & done
...
[1000] 12462

$ jobs
$

$ find /run/systemd/system/ -name 'session-*.scope.d' | wc -l
1

$ find /run/systemd/system/ -name 'session-*.scope.d'
/run/systemd/system/session-1.scope.d

tags: added: verification-done-xenial
removed: verification-needed-xenial
tags: added: verification-done
removed: verification-needed

For documentation purposes,

The systemd upload for this LP bug also resolves LP bug 1847512
(xenial: leftover scope units for Kubernetes transient mounts).

Verification steps posted on that bug.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 229-4ubuntu21.23

---------------
systemd (229-4ubuntu21.23) xenial; urgency=medium

  * d/p/core-use-an-AF_UNIX-SOCK_DGRAM-socket-for-cgroup-age.patch:
    - prevent logind from leaking session files (LP: #1846787)

 -- Heitor Alves de Siqueira <email address hidden> Mon, 07 Oct 2019 07:44:13 -0300

Changed in systemd (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Dan Streetman (ddstreet) on 2019-12-05
tags: removed: ddstreet sts-sponsor-ddstreet

For documentation purposes, the (unrelated) autopkgtest failure on docker.io is reported/worked on bug 1855481.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dbus - 1.10.6-1ubuntu3.5

---------------
dbus (1.10.6-1ubuntu3.5) xenial; urgency=medium

  * Prevent logind from leaking session files (LP: #1846787). Fixed by
    upstream patches:
    - d/p/Only-read-one-message-at-a-time-if-there-are-fds-pen.patch
    - d/p/bus-Fix-timeout-restarts.patch
    - d/p/DBusMainLoop-ensure-all-required-timeouts-are-restar.patch

 -- Heitor Alves de Siqueira <email address hidden> Mon, 07 Oct 2019 08:29:04 -0300

Changed in dbus (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers