Memory leak in 23.10 kernel (6.5.0-10)

Bug #2041668 reported by Craig G
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Mantic
Fix Committed
Medium
Unassigned

Bug Description

There appears to be a memory 'leak' in the kernel of the latest Ubuntu release
(Mantic Minotaur, kernel version 6.5.0-10). I put 'leak' in quotes because this
memory appears to be allocated but never used. The kernel's memory over-commit
mechanism seems to prevent this leak from manifesting in OOM killer triggers or
other failures.

The primary symptom is a steady increase in the value reported as 'Committed_AS'
in '/proc/meminfo'. The other memory stats reported there remain reasonable,
though.

I've observed this problem on a fairly diverse set of machines (both VMs and
physical machines) with a variety of workloads. Busier machines seem to have a
faster leak rate. I've tried to narrow down the issue by rebooting into single
user mode, killing all userspace processes (except the systemd processes) and
removing as many kernel models as possible. The problem continues in that
state. I didn't see any obvious culprits in /proc/slabinfo nor
/proc/vmallocinfo.

So far, the only way I've been able to remediate this issue is to reboot back
into the Lunar Lobster kernel (6.2.0-35). I think this fact alone rules out any
triggers that may be part of the 23.10 userspace environment.

I've attached the generic debug info requested by this component's bug template
from an example machine. Please let me know if there is any more information I
can provide. It seems to be pretty trivial to reproduce though, and I'm
guessing it has not been reported yet because the leak doesn't actually manifest
in an out-of-memory situation. At least, I haven't observed that yet. The
worst case I've observed was 150 GB of memory committed on a machine with 16 GB
of physical RAM after about 24 hours. Moving back to the previous kernel
version, the committed memory statistic holds fairly steady around 7 GB on that
machine and workload.

Revision history for this message
Craig G (cgallek) wrote :
Revision history for this message
Craig G (cgallek) wrote :
Revision history for this message
Craig G (cgallek) wrote :
Revision history for this message
Craig G (cgallek) wrote :
Revision history for this message
Craig G (cgallek) wrote :

FWIW, this appears to be an issue in the upstream 6.5 kernel. It's discussed in more detail here:
https://<email address hidden>/

A fix was applied to the mainline kernel here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3cec50490969afd4a76ccee441f747d869ccff77

It was included as a patch in the stable series (6.5.4) here:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.5.y&id=85746e2ab3fa8c392103507a2de765b2078a609f

Please consider patching this fix into the Ubuntu kernel build.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux-meta (Ubuntu):
status: New → Confirmed
no longer affects: linux-meta (Ubuntu)
no longer affects: linux-meta-hwe-6.5 (Ubuntu)
Changed in linux (Ubuntu Mantic):
status: New → Fix Committed
importance: Undecided → Medium
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Craig.

Excellent work tracking down the mailing list discussion. From what I can see is
that actual memory allocation and free still works as intended, it is just
accounting that becomes incorrect, making your system work correctly, just with
wrong numbers displayed in your metrics.

Regardless, this is currently queued up for 6.5.0-17-generic, and it contains
the commit:

commit 3cec50490969afd4a76ccee441f747d869ccff77
Author: Linus Torvalds <email address hidden>
Date: Sat Sep 16 12:31:42 2023 -0700
Subject: vm: fix move_vma() memory accounting being off
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3cec50490969afd4a76ccee441f747d869ccff77

In Mantic's kernel it is 1f48faf4d74ee05aafbf406d6274727afc62a61f.

$ git describe --contains 1f48faf4d74ee05aafbf406d6274727afc62a61f
Ubuntu-6.5.0-16.16~1071

6.5.0-17-generic is currently in -proposed. If you would like to test it you can follow the instructions below:

Instructions to install (On a Mantic system):
1) cat << EOF | sudo tee /etc/apt/sources.list.d/ubuntu-$(lsb_release -cs)-proposed.list
# Enable Ubuntu proposed archive
deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed main universe
EOF
2) sudo apt update
3) sudo apt install linux-image-6.5.0-17-generic linux-modules-6.5.0-17-generic linux-modules-extra-6.5.0-17-generic linux-headers-6.5.0-17-generic
4) sudo reboot
5) uname -rv
6.5.0-17-generic #17-Ubuntu SMP PREEMPT_DYNAMIC Thu Jan 11 14:01:59 UTC 2024

You probably want to remove the -proposed repository afterward.
6) sudo rm ubuntu-$(lsb_release -cs)-proposed.list
7) sudo apt update

This should be released in the first week of February, somewhere around the 5th
give or take a couple days if any CVEs come up. https://kernel.ubuntu.com/

Thanks,
Matthew

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Craig G (cgallek) wrote :

Thanks for the update. I've just installed the suggested 17 series of kernel packages on a test machine. It will probably take a day or so to confirm the issue is gone, but as long as that patch is in there, I'm pretty confident it will address the issue I was seeing. I'll update again if I still see the issue, otherwise, thanks again for your help!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.