systemd-oomd frequently kills firefox and visual studio code

Bug #1972159 reported by ChM
70
This bug affects 10 people
Affects Status Importance Assigned to Milestone
systemd (Fedora)
Unknown
Unknown
systemd (Ubuntu)
Status tracked in Kinetic
Jammy
Fix Released
Undecided
Nick Rosbrook
Kinetic
Fix Released
High
Nick Rosbrook

Bug Description

[Impact]

The "swap kill" side of systemd-oomd has caused unexpected behavior for desktop users. A user's browser, desktop session, or some other desktop application may be killed by systemd-oomd when SwapUsedLimit is reached, but system performance otherwise appears unaffected. This leaves users confused as to why their application was killed, and has a negative impact on their desktop experience.

For now, let's disable the swap kill functionality by default.

[Test Plan]

On Jammy desktop, check the ManagedOOMSwap property on -.slice:

$ systemctl show -- "-.slice" | grep "^ManagedOOMSwap"
ManagedOOMSwap=kill # After the fix, this should print ManagedOOMSwap=auto

[Where problems could occur]

Disabling swap kill by default means that users may experience degraded system performance due to high swap usage, because systemd-oomd will no longer act on cgroups with high swap usage.

[Other Info]

If a user wishes to restore the original systemd-oomd behavior, they can do so by creating the following overrides file:

 $ cat /etc/systemd/system/-.slice.d/10-oomd-root-slice-defaults.conf
 [Slice]
 ManagedOOMSwap=kill

[Original Description]

Since I installed Ubuntu 22.04, firefox and visual studio code are frequently killed by systemd-oomd (every 2hours).

I have 8 GB memory and never experienced this before the upgrade to Ubuntu 22.04. I thus assume that the claim that there is not enough memory is abusive. Did 64GB of memory become the minimum requirement to run Ubuntu ?

The second problem is that it gives a very bad user experience which is critical for new Ubuntu users.

There should be a warning prior killing apps to give the opportunity to save the app data. There should at least be an apologize and an explanation after killing the app.

The current behavior gives the impression that Ubuntu 22.04 is unreliable and unsafe to use which is a problem for an LTS release that many people might want to use for critical production context.

There might be a configuration problem with systemd-oomd or simply a bogus behavior. I would recommend to disable it or remove it completely until this problem is resolved. This is what I will do for myself because I have work to do.

Related branches

CVE References

Revision history for this message
ChM (christophe-meessen) wrote :

$ free -h
               total utilisé libre partagé tamp/cache disponible
Mem: 7,7Gi 3,3Gi 2,2Gi 113Mi 2,2Gi 4,0Gi
Partition d'échange: 2,0Gi 1,1Gi 936Mi

Syslog:

May 9 09:55:32 xxx systemd[2839]: snap.firefox.firefox.b9635bb0-3585-4241-8d1b-8936cedebc3a.scope: systemd-oomd killed 288 process(es) in this unit.
May 9 09:55:32 xxx systemd[2839]: snap.firefox.firefox.b9635bb0-3585-4241-8d1b-8936cedebc3a.scope: Consumed 5min 25.300s CPU time.
May 9 09:55:33 xxx systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
May 9 09:55:37 xxx systemd-oomd[607]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/snap.code.code.9ab9bef1-a97e-46df-8879-2377452219ab.scope due to memory used (8181518336) / total (8280240128) and swap used (1969328128) / total (2147479552) being more than 90.00%
May 9 09:55:37 xxx systemd[2839]: snap.code.code.9ab9bef1-a97e-46df-8879-2377452219ab.scope: systemd-oomd killed 82 process(es) in this unit.
May 9 09:55:38 xxx systemd[2839]: snap.code.code.9ab9bef1-a97e-46df-8879-2377452219ab.scope: Consumed 29min 25.392s CPU time.

Revision history for this message
ChM (christophe-meessen) wrote :

$ LANG=en free -h
               total used free shared buff/cache available
Mem: 7.7Gi 3.3Gi 2.1Gi 113Mi 2.2Gi 4.0Gi
Swap: 2.0Gi 1.1Gi 941Mi

Revision history for this message
Sebastien Bacher (seb128) wrote :

When did you get the 'free' information?

The log states

> due to memory used (8181518336) / total (8280240128) and swap used (1969328128) / total (2147479552)

it would be interesting to know if you actually hit the limits or if the computation is wrong?

Changed in systemd (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Lukas Märdian (slyon)
tags: added: rls-kk-incoming
tags: added: rls-jj-incoming
removed: rls-kk-incoming
Revision history for this message
ChM (christophe-meessen) wrote :

The free -h information was collected by me after my firefox and visual studio code were killed.

The values in the log report were generated by systemd-oomd when it killed the apps I assume.

I disabled it, I thus can't contribute any further. sorry.

Lukas Märdian (slyon)
tags: removed: rls-jj-incoming
Revision history for this message
ChM (christophe-meessen) wrote :

I have upgraded my computer to 40GB of memory and restarted systemd-oomd.

Memory usage is stable. No memory leak to report. I can make further test if needed.

Revision history for this message
Connor Nolan (thebrokenrail) wrote :

This also affects me:

May 26 08:47:22 <hostname> systemd-oomd[542]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/snap.firefox.firefox.cef581a2-89d2-4a72-86a9-8a0a30cfdb86.scope due to memory used (14922825728) / total (16526151680) and swap used (1944498176) / total (2147479552) being more than 90.00%

I never had any issues with Ubuntu 20.04 and this also occurs with QtCreator (upstream version not distribution version).

Revision history for this message
ChM (christophe-meessen) wrote :

Problem definitely disappeared once I upgraded memory to 40GB. But memory usage displayed in top never exceed 6GB.

Revision history for this message
John S (johnps) wrote :

For me I have 32GB of ram and it was killing my entire user session and dropping me back to the GDM. (It took me over a week to realise this was happening and I was kicked back to login losing all work at least 15 times, having to log back in produced a lot of log spam that hid the OOM message, I was close to having to reinstall or switch distro to fix it) I was using i3 but this was totally unusable for me and I had to remove the service. Does it just not play nice with applications that grab as much RAM as possible and free it when the system is under pressure?

Personally if this isn't fixed and OOM gets turned back on at some point it would be severe enough for me to switch distro.

Revision history for this message
Kevin (kevin-b-er) wrote :

This is greatly exasperated because systemd until v251 is using MemFree and not MemAvailable to decide how much memory is remaining. Since Linux aggressively uses MemFree for caching, this will result in systemd-oomd excessively killing applications.

There's a fix in upstream 030bc91cb98385904b28a839d1e04bb4160a52d2, which was released as v251 about a week ago.

Revision history for this message
Adolfo Jayme Barrientos (fitojb) wrote :

I wish upstart was back

Revision history for this message
Sebastien Bacher (seb128) wrote :

@Kevin, we cherrypicked that patch in 22.04 before release

https://launchpad.net/ubuntu/+source/systemd/249.11-0ubuntu3

Revision history for this message
Lester Carballo Pérez (lestcape) wrote (last edit ):

In my case that was occurring compiling the kernel. The process take 6 hours and crash without finished in three occasions. See:
https://gitlab.freedesktop.org/drm/amd/-/issues/1569#note_1409226

To ensure it finished, I released the cache several time in the process:
sudo sh -c " sync; echo 3 > /proc/sys/vm/drop_caches"

When I do that all was ok, but sure is not nice to have a process running with an interval time to observed and cleaning the cache if it's to high.

LC_MESSAGES=C free -h
               total used free shared buff/cache available
Mem: 31Gi 16Gi 13Gi 277Mi 1.3Gi 14Gi
Swap: 39Gi 1.0Mi 39Gi

Revision history for this message
djchandler (djchandler) wrote :

I use a swap file size of 16 GB, equal to ram size on one of my systems that's been upgraded to 22.04, using Wayland and Gnome. No problems for me yet. Could simply increasing swap file size alleviate this issue for most until the patch(s) reaches endusers? (rhetorical question, no feedback please.)

Revision history for this message
Sebastien Bacher (seb128) wrote :
Lukas Märdian (slyon)
tags: added: rls-jj-incoming
tags: added: rls-kk-incoming
Revision history for this message
Lukas Märdian (slyon) wrote :

As discussed on the mailing list, would you be able to test a modification to the systemd-oomd configuration by placing a new file in /etc and report back if that improves the situation for you (after a reboot)?

$ cat /etc/systemd/system/-.slice.d/10-oomd-root-slice-defaults.conf
[Slice]
ManagedOOMSwap=auto

Changed in systemd (Ubuntu):
status: Incomplete → Confirmed
tags: added: fr-2482
tags: removed: rls-jj-incoming rls-kk-incoming
Revision history for this message
Damiön la Bagh (kat-amsterdam) wrote :

This is also occuring on Ubuntu Server on VPS.

I have 7x Ubuntu 22.04 VPSses with 1 GB physical RAM and a 3GB swap file. The programs I'm running get killed which stops production even though the swap file is far from full.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

> This is also occuring on Ubuntu Server on VPS.

Did you install/enable systemd-oomd manually? AFAIK, systemd-oomd is not enabled by default on server. Or is something else killing your programs?

> swap file is far from full.

If it is systemd-oomd, do you have logs from this (journalctl -u systemd-oomd)?

Revision history for this message
Nick Rosbrook (enr0n) wrote :

> $ cat /etc/systemd/system/-.slice.d/10-oomd-root-slice-defaults.conf
> [Slice]
> ManagedOOMSwap=auto

I have been running with this configuration this week, and have been running a script to log occurrences of my memory and swap usage each exceeding 90%. I have had several such occurrences, but have yet to experience any noticeable performance issues. For example, this morning `oomctl` reported the following usage, but I was able to continue using my system without any noticeable difference:

$ oomctl
Dry Run: no
Swap Used Limit: 90.00%
Default Memory Pressure Limit: 60.00%
Default Memory Pressure Duration: 20s
System Context:
        Memory: Used: 14.3G Total: 15.5G
        Swap: Used: 979.9M Total: 979.9M
Swap Monitored CGroups:
Memory Pressure Monitored CGroups:
        Path: /user.slice/user-1000.slice/user@1000.service
                Memory Pressure Limit: 50.00%
                Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 6s
                Current Memory Usage: 13.5G
                Memory Min: 0B
                Memory Low: 0B
                Pgscan: 13490147
                Last Pgscan: 13490147

This is just one data point of course, but it puts me in favor of disabling swap kill for Jammy.

Nick Rosbrook (enr0n)
description: updated
Lukas Märdian (slyon)
Changed in systemd (Ubuntu Jammy):
status: New → In Progress
assignee: nobody → Nick Rosbrook (enr0n)
Changed in systemd (Ubuntu Kinetic):
assignee: nobody → Nick Rosbrook (enr0n)
Revision history for this message
Lukas Märdian (slyon) wrote :

Sponsored into Jammy. The fix is also part of the systemd v251 merge pending for Kinetic.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

What is the status of this in kinetic? Is this staged to be included in the v251 systemd merge that is planned shortly? I'll review it assuming that this is the case.

Changed in systemd (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello ChM, or anyone else affected,

Accepted systemd into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/249.11-0ubuntu3.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

The SRU looked good so I accepted it into -proposed, since this basically seems the current consensus regarding the systemd-oomd situation in jammy. It's a behavioral change, but is not viable to call a 'regression'.

Note to SRU members: let's make sure that the kinetic counterpart gets staged before this is released into -updates.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/249.11-0ubuntu3.4)

All autopkgtests for the newly accepted systemd (249.11-0ubuntu3.4) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

prometheus-libvirt-exporter/unknown (s390x)
corosync-qdevice/unknown (ppc64el, s390x)
csync2/unknown (ppc64el, s390x)
pyudev/unknown (ppc64el)
dlm/unknown (s390x)
systemd/unknown (ppc64el)
tpm2-abrmd/unknown (ppc64el)
dbus/unknown (ppc64el)
linux-lowlatency/5.15.0-40.43 (arm64)
samba/unknown (ppc64el)
dovecot/unknown (ppc64el, arm64)
systemd/249.11-0ubuntu3.4 (armhf)
prometheus-exporter-exporter/unknown (ppc64el)
conntrack-tools/unknown (ppc64el, s390x)
cups/unknown (s390x)
dpdk/unknown (ppc64el, arm64)
network-manager/1.36.6-0ubuntu2 (arm64)
php8.1/unknown (s390x)
procps/unknown (ppc64el)
gvfs/1.48.2-0ubuntu1 (ppc64el)
tgt/unknown (ppc64el)
redis/unknown (ppc64el)
qlcplus/unknown (ppc64el, s390x)
prometheus-squid-exporter/unknown (s390x)
flatpak/1.12.7-1 (amd64)
dq/unknown (arm64)
rtkit/unknown (ppc64el)
cockpit/unknown (s390x)
netplan.io/unknown (s390x)
pdns-recursor/unknown (s390x)
remctl/unknown (ppc64el)
comitup/unknown (s390x)
policykit-1/unknown (ppc64el)
rust-whoami/unknown (ppc64el)
casync/2+20201210-1build1 (ppc64el)
dbus-broker/unknown (ppc64el, s390x)
rpcbind/unknown (ppc64el)
openzwave/unknown (s390x)
netplan.io/0.104-0ubuntu2 (amd64, arm64)
postgresql-14/unknown (ppc64el)
python-uinput/unknown (s390x)
libsoup2.4/unknown (ppc64el)
libsfml/unknown (ppc64el)
polkit-qt-1/unknown (ppc64el)
debspawn/unknown (ppc64el)
corosync/unknown (ppc64el, s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Nick Rosbrook (enr0n) wrote :

I have tested systemd 249.11-0ubuntu3.4 from jammy-proposed to verify the fix:

$ systemctl --version
systemd 249 (249.11-0ubuntu3.3)
[...]

$ systemctl show -- "-.slice" | grep "^ManagedOOMSwap"
ManagedOOMSwap=kill

$ sudo apt update && sudo apt install -y systemd
[...]

$ systemctl --version
systemd 249 (249.11-0ubuntu3.4)
[...]

$ systemctl show -- "-.slice" | grep "^ManagedOOMSwap"
ManagedOOMSwap=auto

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Dan Streetman (ddstreet) wrote :

Have you checked that systemd-oomd is actually killing anything in any situation now? Meaning, is this effectively the same as completely disabling systemd-oomd? And if it is the same as completely disabling systemd-oomd, would that be a better default?

Revision history for this message
Nick Rosbrook (enr0n) wrote :

Yes, I have run stress tests to confirm that the "memory pressure" kill logic is still enabled in systemd-oomd. I have not personally experienced such occurrences in my day-to-day, however.

But to answer your question, no this is not the same as entirely disabling systemd-oomd.

Lukas Märdian (slyon)
Changed in systemd (Ubuntu Kinetic):
status: Confirmed → Fix Committed
Revision history for this message
Lukas Märdian (slyon) wrote :

All autopkgtest regressions have been resolved by re-running them, using proper triggers.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (22.3 KiB)

This bug was fixed in the package systemd - 251.2-2ubuntu1

---------------
systemd (251.2-2ubuntu1) kinetic; urgency=medium

  [ Nick Rosbrook ]
  * Merge to Ubuntu from Debian unstable
    - Dropped patches (applied upstream)
      + test-explicitly-configure-oomd-stuff-via-dropins.patch
      + test-enable-systemd-oomd.service.patch
      + linux-5.15-compat-ioprio/shared-split-out-ioprio-related-stuff-into-ioprio-util.-c.patch
      + linux-5.15-compat-ioprio/variuos-add-missing-includes.patch
      + linux-5.15-compat-ioprio/man-don-t-mention-IOSchedulingClass-none-anymore-in-the-d.patch
      + linux-5.15-compat-ioprio/test-add-test-for-ioprio-normalization.patch
      + linux-5.15-compat-ioprio/Define-ioprio_-get-set-the-same-as-other-compat-syscalls.patch
      + linux-5.15-compat-ioprio/Get-rid-of-ioprio.h-and-add-a-minimalistic-reimplementati.patch
      + linux-5.15-compat-ioprio/ioprio-util-add-macro-for-default-ioprio-settings.patch
      + linux-5.15-compat-ioprio/ioprio-normalize-io-priority-values-in-configuration.patch
      + linux-5.15-compat-ioprio/core-normalize-ioprio-values-we-acquire-from-kernel.patch
      + test-also-show-the-memory-pressure-of-testchill.service.patch
      + test-make-test-55-oomd-less-flaky.patch
      + lp1964494-network-do-not-enable-IPv4-ACD-for-IPv4-link-local-a.patch
      + lp1966381-oomd-calculate-used-memory-with-MemAvailable-instead-of-M.patch
      + lp1926860-hwdb-remove-the-tablet-pad-entry-for-the-UC-Logic-1060N.patch
      + oomd-move-oomctl-to-bindir.patch
      + test-enable-debug-logging-of-systemd-oomd.patch
      + lp1943561/Add-additional-Dell-models-that-require-ACCEL_LOCATION-ba.patch
      + lp1943561/Use-SKU-to-identify-Dell-clamshell-models-for-acceleromet.patch
      + lp1929345/hwdb-Force-release-calculator-key-on-all-HP-OMEN-laptops.patch
      + lp1929345/hwdb-Add-force-release-for-HP-Omen-15-calculator-key.-205.patch
      + sysusers-split-up-systemd.conf.patch
      + hwdb-Add-mic-mute-key-mapping-for-HP-Elite-x360.patch
      + test-check-memory-pressure-more-frequently.patch
      + meson-minor-cleanup.patch
      + units-don-t-install-dbus-org.freedesktop.oom1.service-ali.patch
      + lp1950508-cgroup-check-if-any-controller-is-in-use-as-v1.patch
      + lp1952735-keymap-Add-microphone-mute-keymap-for-Dell-Machine.patch
      + test-tweak-parameters-for-TEST-55-OOMD.patch
      + deny-list-TEST-29-PORTABLE-and-TEST-50-DISSECT.patch
      + lp1955997-unmask-intel-hid-for-HP-machines.patch
      + lp1952733-hwdb-60-keyboard-Update-Dell-Privacy-Micmute-Hotkey-Map.patch
      + Merge-pull-request-20705-from-yuwata-test-oomd-util.patch
    - Refreshed patches
      + debian/Ubuntu-UseDomains-by-default.patch
      + debian/UBUNTU-Support-system-image-read-only-etc.patch
      + Revert-network-if-sys-is-rw-then-udev-should-be-around.patch
      + debian/UBUNTU-src-test-testmount-util.c-Skip-parts-of-test-mount-util-in-LXC.patch
      + lp1950794-Revert-sd-dhcp-do-not-use-detect_container-to-guess-.patch
      + 0001-Revert-tests-add-test-case-for-UMask-BindPaths-combi.patch
  * Drop debian/Ubuntu-core-in-execute-soft-fail-setting-Nice-priority-when.patch.
    This patch...

Changed in systemd (Ubuntu Kinetic):
status: Fix Committed → Fix Released
Revision history for this message
Tim Richardson (tim-richardson) wrote :

For me, systemd-oomd no longer kills at all. The memory pressure threshold is still active, but I think the default of 50% on the user slice is way too high. I can put a 4gb test VM under extreme memory load and get so much swap activity that CPU load in a two core VM gets > 50, yet the memory pressure score is 14%. I can not conceive of what type of load would get it to 50%.

I have set the user slice threshold to 10%, and when I attempt to load 100 tabs, the browser is killed a couple of minutes after memory and swap is exhausted. It's not an aggressive kill, but it lets systemd-oomd actually kill something.

So far it has only ever killed the guilty app. I think if the aim is not have systemd-oomd ever kill anything, 50% memory threshold and swap kill off achieves the goal, but if you want it to kill baes on memory pressure, the memory threshold needs to be much lower. killing on memory pressure was supposed to be one of the great things about systemd-oomd, I thought.

I note the systemd-cgtop shows there are many tasks under the user slice (I have about 400 when idle, and about 1200 when the brower is trying to load all those tabs). All the system slices have < 5 tasks. So one or two of those processes being stalled will result is a steep increase in memory pressure KPI. But perhaps with so many tasks in the user slice, the KPI is highly "diluted" and needs a much lower threshold to be meaningful.

Maybe this is all very different on a raspberry PI.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 249.11-0ubuntu3.4

---------------
systemd (249.11-0ubuntu3.4) jammy; urgency=medium

  [ Mustafa Kemal Gilor ]
  * d/p/lp1978079-efi-pstore-not-cleared-on-boot.patch: pstore: Run after
    modules are loaded. Thanks to Alexander Graf <email address hidden>.
    (LP: #1978079)
    Author: Mustafa Kemal Gilor
    File: debian/patches/lp1978079-efi-pstore-not-cleared-on-boot.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=d990b13612810a296246011ad66a165b30166702

  [ Nick Rosbrook ]
  * systemd-oomd: set ManagedOOMSwap=auto on -.slice (LP: #1972159)
    This has the effect of disabling swap kill by default, so cgroups will
    only be monitored for memory pressure, and not swap usage.
    File: debian/extra/systemd-oomd-defaults/-.slice.d/10-oomd-root-slice-defaults.conf
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e93c944c58ec376454301e9c9b55d35be7c14a89

 -- Nick Rosbrook <email address hidden> Mon, 27 Jun 2022 14:28:46 -0400

Changed in systemd (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.