Memory leak on large server
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Hi,
We are trying to diagnose a kernel memory look on a production Ubuntu 22.04.2 LTS.
We have tried several official Ubuntu kernels, 5.15aws, 5.19aws and now even 6.2.0-1004-aws (all Ubuntu signed):
```
# cat /proc/version_
Ubuntu 6.2.0-1004.4-aws 6.2.6
```
This is a production server so we'll appreciate any and all help diagnosing and solving this issue!
The server is an u-112 instance with 12TB RAM, and is losing 1TB+ of memory a day to a kernel leak.
For example, currently with an uptime of 3.5 days, we have 1.8Ti available, however RSS+slabs is only 4.1TB.
all active process together take about 4TB of RAM (`ps -eo rss | awk 'BEGIN {x=0} {x = x + $1} END {print x}'` gives 4088636708).
From slabtop we see about 100GB are consumed by slab (`slabtop -o -s t | head`: )
```
Active / Total Objects (% used) : 303580174 / 332642344 (91.3%)
Active / Total Slabs (% used) : 6697552 / 6697552 (100.0%)
Active / Total Caches (% used) : 158 / 215 (73.5%)
Active / Total Size (% used) : 112801663.93K / 121442845.45K (92.9%)
Minimum / Average / Maximum Object : 0.01K / 0.36K / 16.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
67537280 59696907 88% 0.03K 527635 128 2110540K kmalloc-32
65247564 65241398 99% 0.31K 1279364 51 20469824K arc_buf_hdr_t_full
58270446 58040685 99% 0.10K 747057 78 5976456K abd_t
16697268 13731405 82% 0.38K 397554 42 6360864K dmu_buf_impl_t
15982912 10366686 64% 0.50K 249733 64 7991456K kmalloc-512
14975616 11605380 77% 0.06K 233994 64 935976K kmalloc-64
```
In /proc/meminfo:
```
MemTotal: 12656421408 kB
MemFree: 1975976204 kB
MemAvailable: 1968415088 kB
Buffers: 1087956 kB
Cached: 101168004 kB
SwapCached: 17912340 kB
Active: 101022084 kB
Inactive: 4129984264 kB
Active(anon): 94623216 kB
Inactive(anon): 4104673512 kB
Active(file): 6398868 kB
Inactive(file): 25310752 kB
Unevictable: 338908 kB
Mlocked: 332132 kB
SwapTotal: 4294967292 kB
SwapFree: 3500705532 kB
Zswap: 0 kB
Zswapped: 0 kB
Dirty: 2908 kB
Writeback: 0 kB
AnonPages: 4123489132 kB
Mapped: 3761620 kB
Shmem: 70756156 kB
KReclaimable: 10319220 kB
Slab: 122355620 kB
SReclaimable: 10319220 kB
SUnreclaim: 112036400 kB
KernelStack: 1793296 kB
PageTables: 21748556 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 10623177996 kB
Committed_AS: 6775476544 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 296984480 kB
VmallocChunk: 0 kB
Percpu: 1326080 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1630980096 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 2056036 kB
DirectMap2M: 40935424 kB
DirectMap1G: 12814647296 kB
```
Its not a tmpfs/shm fs issue either:
```
df -h | grep -E 'tmpfs|shm'
tmpfs 256G 70G 187G 27% /dev/shm
tmpfs 256G 3.4M 256G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 8.0G 24K 8.0G 1% /run/user/10102
tmpfs 8.0G 24K 8.0G 1% /run/user/1002
tmpfs 8.0G 24K 8.0G 1% /run/user/10030
tmpfs 8.0G 24K 8.0G 1% /run/user/10194
tmpfs 8.0G 24K 8.0G 1% /run/user/10200
tmpfs 8.0G 24K 8.0G 1% /run/user/10136
tmpfs 8.0G 24K 8.0G 1% /run/user/10198
tmpfs 8.0G 24K 8.0G 1% /run/user/10143
tmpfs 8.0G 24K 8.0G 1% /run/user/10188
tmpfs 8.0G 24K 8.0G 1% /run/user/10124
tmpfs 8.0G 24K 8.0G 1% /run/user/10174
tmpfs 8.0G 24K 8.0G 1% /run/user/10165
tmpfs 8.0G 24K 8.0G 1% /run/user/10197
tmpfs 8.0G 24K 8.0G 1% /run/user/10183
tmpfs 8.0G 24K 8.0G 1% /run/user/10033
tmpfs 8.0G 24K 8.0G 1% /run/user/10023
tmpfs 8.0G 24K 8.0G 1% /run/user/10133
tmpfs 8.0G 24K 8.0G 1% /run/user/10185
tmpfs 8.0G 24K 8.0G 1% /run/user/10201
tmpfs 8.0G 24K 8.0G 1% /run/user/1004
tmpfs 8.0G 24K 8.0G 1% /run/user/10014
```
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
CRDA: N/A
CasperMD5CheckR
DistroRelease: Ubuntu 22.04
Ec2AMI: ami-08c40ec9ead
Ec2AMIManifest: (unknown)
Ec2Availability
Ec2InstanceType: u-12tb1.112xlarge
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci: Error: [Errno 2] No such file or directory: 'lspci'
Lspci-vt: Error: [Errno 2] No such file or directory: 'lspci'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t:
Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
MachineType: Amazon EC2 u-12tb1.112xlarge
NonfreeKernelMo
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
LC_CTYPE=C.UTF-8
TERM=xterm-
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy ec2-images
Uname: Linux 6.2.0-1004-aws x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: False
dmi.bios.date: 10/16/2017
dmi.bios.release: 1.0
dmi.bios.vendor: Amazon EC2
dmi.bios.version: 1.0
dmi.board.
dmi.board.vendor: Amazon EC2
dmi.chassis.
dmi.chassis.type: 1
dmi.chassis.vendor: Amazon EC2
dmi.modalias: dmi:bvnAmazonEC
dmi.product.name: u-12tb1.112xlarge
dmi.sys.vendor: Amazon EC2
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
apport information