Memory leak in 23.10 kernel (6.5.0-10)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Mantic |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
There appears to be a memory 'leak' in the kernel of the latest Ubuntu release
(Mantic Minotaur, kernel version 6.5.0-10). I put 'leak' in quotes because this
memory appears to be allocated but never used. The kernel's memory over-commit
mechanism seems to prevent this leak from manifesting in OOM killer triggers or
other failures.
The primary symptom is a steady increase in the value reported as 'Committed_AS'
in '/proc/meminfo'. The other memory stats reported there remain reasonable,
though.
I've observed this problem on a fairly diverse set of machines (both VMs and
physical machines) with a variety of workloads. Busier machines seem to have a
faster leak rate. I've tried to narrow down the issue by rebooting into single
user mode, killing all userspace processes (except the systemd processes) and
removing as many kernel models as possible. The problem continues in that
state. I didn't see any obvious culprits in /proc/slabinfo nor
/proc/vmallocinfo.
So far, the only way I've been able to remediate this issue is to reboot back
into the Lunar Lobster kernel (6.2.0-35). I think this fact alone rules out any
triggers that may be part of the 23.10 userspace environment.
I've attached the generic debug info requested by this component's bug template
from an example machine. Please let me know if there is any more information I
can provide. It seems to be pretty trivial to reproduce though, and I'm
guessing it has not been reported yet because the leak doesn't actually manifest
in an out-of-memory situation. At least, I haven't observed that yet. The
worst case I've observed was 150 GB of memory committed on a machine with 16 GB
of physical RAM after about 24 hours. Moving back to the previous kernel
version, the committed memory statistic holds fairly steady around 7 GB on that
machine and workload.
no longer affects: | linux-meta (Ubuntu) |
no longer affects: | linux-meta-hwe-6.5 (Ubuntu) |
Changed in linux (Ubuntu Mantic): | |
status: | New → Fix Committed |
importance: | Undecided → Medium |
FWIW, this appears to be an issue in the upstream 6.5 kernel. It's discussed in more detail here:
https://<email address hidden>/
A fix was applied to the mainline kernel here: /git.kernel. org/pub/ scm/linux/ kernel/ git/torvalds/ linux.git/ commit/ ?id=3cec5049096 9afd4a76ccee441 f747d869ccff77
https:/
It was included as a patch in the stable series (6.5.4) here: /git.kernel. org/pub/ scm/linux/ kernel/ git/stable/ linux.git/ commit/ ?h=linux- 6.5.y&id= 85746e2ab3fa8c3 92103507a2de765 b2078a609f
https:/
Please consider patching this fix into the Ubuntu kernel build.