memory leak with unknown reason after upgrade to utopic

Bug #1401817 reported by Markus Wagner
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

I'm not sure if this is the right component/package. We are not able to identify a package/program which could be responsible for this bug, so we assume it is the kernel or at least a kernel configuration.

Since the upgrade from trusty to utopic at October 27., our three servers for package building are massively swapping. This leads to by a factor of four slower build times. There were no configuration changes at all.

Munin graphs of one of the affected servers are attached.

The system has 48GB memory, and we have configured several ramdisks:

root@cx1000-02:~# mount | grep -E "tmpfs|ramfs"
none on /sys/fs/cgroup type tmpfs (rw,uid=0,gid=0,mode=0755,size=1024)
udev on /dev type devtmpfs (rw,mode=0755)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,nodev,noexec,nosuid,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs (rw,nodev,noexec,nosuid,size=104857600,mode=0755)
tmpfs on /var/tmp/obs type tmpfs (rw,size=36G)
tmpfs on /run/obs type tmpfs (rw,relatime,size=1G)
ramfs on /var/obscache type ramfs (rw)
tmpfs on /run/obs type tmpfs (rw,relatime,size=1048576k)

The build service uses several chroot environments to build packages for several distributions (http://openbuildservice.org/).

You don't see any process which allocates so much memory with 'top'. Even after stopping the build service and unmounting /var/tmp/obs you are not able to deactivate the swap via swapoff -a:

root@cx1000-02:~# swapoff -a
swapoff: /dev/sda5: swapoff failed: Cannot allocate memory

We also tested the latest kernel (3.16.0-28) and an old trusty kernel (3.13.0-40) without success. Also patching perl to exclude the perl bug https://rt.perl.org/Public/Bug/Display.html?id=123198 as root cause doesn't help.

We don't have any idea what's the cause for this problem...

We can deploy debug packages or anything else on the system, so please hesitate to ask for any information or help.

ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: linux-image-3.16.0-26-generic 3.16.0-26.35
ProcVersionSignature: Ubuntu 3.16.0-26.35-generic 3.16.7-ckt1
Uname: Linux 3.16.0-26-generic x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.7-0ubuntu8
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
Date: Fri Dec 12 09:54:36 2014
HibernationDevice: RESUME=UUID=531cb8f2-e194-45a7-a13c-3167aacf89bf
InstallationDate: Installed on 2011-09-20 (1178 days ago)
InstallationMedia: Ubuntu-Server 11.04 "Natty Narwhal" - Release amd64 (20110426)
IwConfig:
 eth0 no wireless extensions.

 eth1 no wireless extensions.

 lo no wireless extensions.
MachineType: FUJITSU PRIMERGY CX122 S1
PciMultimedia:

ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.16.0-26-generic root=UUID=6e5b9bee-ccdf-4e6f-b09d-759312aeb3e0 ro quiet
RelatedPackageVersions:
 linux-restricted-modules-3.16.0-26-generic N/A
 linux-backports-modules-3.16.0-26-generic N/A
 linux-firmware 1.138
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to utopic on 2014-10-27 (45 days ago)
dmi.bios.date: 05/24/2011
dmi.bios.vendor: FUJITSU // Phoenix Technologies Ltd.
dmi.bios.version: 6.00 Rev. 1.10.2899.A1
dmi.board.asset.tag: -
dmi.board.name: D2899
dmi.board.vendor: FUJITSU
dmi.board.version: S26361-D2899-A11 WGS02 GS01
dmi.chassis.asset.tag: YLAM001377
dmi.chassis.type: 23
dmi.chassis.vendor: FUJITSU
dmi.chassis.version: CX122S1R
dmi.modalias: dmi:bvnFUJITSU//PhoenixTechnologiesLtd.:bvr6.00Rev.1.10.2899.A1:bd05/24/2011:svnFUJITSU:pnPRIMERGYCX122S1:pvrGS01:rvnFUJITSU:rnD2899:rvrS26361-D2899-A11WGS02GS01:cvnFUJITSU:ct23:cvrCX122S1R:
dmi.product.name: PRIMERGY CX122 S1
dmi.product.version: GS01
dmi.sys.vendor: FUJITSU

Revision history for this message
Markus Wagner (markus-wagner) wrote :
Revision history for this message
Markus Wagner (markus-wagner) wrote :
Revision history for this message
Markus Wagner (markus-wagner) wrote :
Revision history for this message
Markus Wagner (markus-wagner) wrote :
information type: Public → Private Security
information type: Private Security → Public
Revision history for this message
Markus Wagner (markus-wagner) wrote :

sorry, typo: ...so please DON'T hesitate to ask for any information or help.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Chris J Arges (arges) wrote :

Hi,
Could you run 'top' then type 'M' to see which processes are using the most memory?
--chris

Revision history for this message
Marcus Klein (kleini76) wrote :

I can reproduce this very easily. Just download a 30G file with wget. The kernel fills up your memory until the machine starts swapping. Most consuming memory area is the cache memory.

$ free
             Gesamt Belegt Frei Gemeinsam Puffer Cached
Speicher: 16374340 15801164 573176 138524 88 9082352
-/+ Puffer/Cache: 6718724 9655616
Auslagerungsdatei: 16777208 1397596 15379612

Image is stored in an XFS filesystem on a rotating disc partition.

Revision history for this message
Marcus Klein (kleini76) wrote :

Hmm, sorry. Downloaded file is stored in an XFS filesystem on a rotating disc partition. So there needs to be something wrong in the block cache implementation in the kernel causing memory usage to be higher that available memory. And consuming more memory for block caches than currently is available leads to swapping. This does not make sense at all...

Revision history for this message
Markus Wagner (markus-wagner) wrote :
Download full text (12.2 KiB)

here is the top output:

top - 17:56:06 up 1 day, 8:48, 1 user, load average: 0.00, 2.67, 14.10
Tasks: 269 total, 1 running, 268 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 49452964 total, 45681296 used, 3771668 free, 16276 buffers
KiB Swap: 50320380 total, 15316768 used, 35003612 free. 43456648 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16739 root 1 -19 76904 18980 4304 S 0.0 0.0 0:04.91 bs_worker
16746 root 1 -19 76904 18980 4304 S 0.0 0.0 0:05.08 bs_worker
16736 root 1 -19 76832 18976 4304 S 0.0 0.0 0:05.20 bs_worker
16745 root 1 -19 76904 18976 4304 S 0.0 0.0 0:05.16 bs_worker
16731 root 1 -19 76904 18960 4284 S 0.0 0.0 0:05.14 bs_worker
16740 root 1 -19 76884 18952 4292 S 0.0 0.0 0:05.21 bs_worker
16737 root 1 -19 76832 18936 4264 S 0.0 0.0 0:05.13 bs_worker
16741 root 1 -19 76904 18932 4256 S 0.0 0.0 0:04.86 bs_worker
16743 root 1 -19 76904 18932 4256 S 0.0 0.0 0:04.93 bs_worker
16742 root 1 -19 76836 18888 4216 S 0.0 0.0 0:05.11 bs_worker
16744 root 1 -19 76904 18880 4212 S 0.0 0.0 0:05.10 bs_worker
16725 root 1 -19 76884 18872 4216 S 0.0 0.0 0:05.03 bs_worker
16735 root 1 -19 76832 18872 4200 S...

Revision history for this message
Markus Wagner (markus-wagner) wrote :
Download full text (36.0 KiB)

and another one with corresponding ps aux:

top - 18:04:02 up 1 day, 8:56, 1 user, load average: 0.04, 0.80, 8.61
Tasks: 269 total, 1 running, 268 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 49452964 total, 45882236 used, 3570728 free, 17084 buffers
KiB Swap: 50320380 total, 15316768 used, 35003612 free. 43639820 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16739 root 1 -19 76904 18980 4304 S 0.0 0.0 0:04.95 bs_worker
16746 root 1 -19 76904 18980 4304 S 0.0 0.0 0:05.14 bs_worker
16736 root 1 -19 76832 18976 4304 S 0.0 0.0 0:05.25 bs_worker
16745 root 1 -19 76904 18976 4304 S 0.0 0.0 0:05.21 bs_worker
16731 root 1 -19 76904 18960 4284 S 0.0 0.0 0:05.20 bs_worker
16740 root 1 -19 76884 18952 4292 S 0.0 0.0 0:05.26 bs_worker
16737 root 1 -19 76832 18936 4264 S 0.0 0.0 0:05.18 bs_worker
16741 root 1 -19 76904 18932 4256 S 0.0 0.0 0:04.92 bs_worker
16743 root 1 -19 76904 18932 4256 S 0.0 0.0 0:05.00 bs_worker
16742 root 1 -19 76836 18888 4216 S 0.0 0.0 0:05.16 bs_worker
16744 root 1 -19 76904 18880 4212 S 0.0 0.0 0:05.15 bs_worker
16725 root 1 -19 76884 18872 4216 S 0.3 0.0 0:05.09 bs_worker
16735 root 1 -19 768...

Revision history for this message
Markus Wagner (markus-wagner) wrote :
Download full text (10.1 KiB)

top after some hours of idle:

top - 22:37:02 up 1 day, 13:29, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 268 total, 3 running, 265 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.3 sy, 0.0 ni, 98.7 id, 0.4 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 49452964 total, 45896468 used, 3556496 free, 19260 buffers
KiB Swap: 50320380 total, 15316760 used, 35003620 free. 43644124 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16739 root 1 -19 76904 18980 4304 S 0.1 0.0 0:06.64 bs_worker
16745 root 1 -19 76904 18980 4304 S 0.0 0.0 0:06.91 bs_worker
16746 root 1 -19 76904 18980 4304 S 0.0 0.0 0:06.80 bs_worker
16736 root 1 -19 76832 18976 4304 S 0.1 0.0 0:06.90 bs_worker
16731 root 1 -19 76904 18960 4284 S 0.0 0.0 0:06.90 bs_worker
16740 root 1 -19 76884 18952 4292 S 0.1 0.0 0:06.98 bs_worker
16737 root 1 -19 76832 18936 4264 S 0.0 0.0 0:06.83 bs_worker
16741 root 1 -19 76904 18932 4256 S 0.0 0.0 0:06.42 bs_worker
16743 root 1 -19 76904 18932 4256 S 0.1 0.0 0:06.67 bs_worker
16744 root 1 -19 76904 18892 4212 S 0.0 0.0 0:06.87 bs_worker
16742 root 1 -19 76836 18888 4216 S 0.0 0.0 0:06.79 bs_worker
16725 root 1 -19 76884 18872 4216 S 0.0 0.0 0:06.77 bs_worker
16735 root 1 -19 76832 18872 4...

Revision history for this message
Markus Wagner (markus-wagner) wrote :
Download full text (10.1 KiB)

top after stopping the buildservice:

top - 22:37:55 up 1 day, 13:30, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 245 total, 2 running, 243 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 49452964 total, 45567572 used, 3885392 free, 19424 buffers
KiB Swap: 50320380 total, 15316644 used, 35003736 free. 43645264 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  805 root 20 0 52780 13564 3540 S 0.0 0.0 0:05.05 munin-node
 3351 root 20 0 99532 6456 5480 S 0.0 0.0 0:00.03 sshd
 3385 root 20 0 26716 5344 3376 S 0.0 0.0 0:00.05 bash
 1496 root 20 0 123748 3976 3376 S 0.0 0.0 0:06.23 thermald
 1243 root 20 0 10272 3928 1620 S 0.0 0.0 0:00.34 dhclient
 1532 root 20 0 59580 3672 2984 S 0.0 0.0 0:00.01 sshd
 1679 ntp 20 0 31488 3336 2728 S 0.0 0.0 0:06.35 ntpd
 3987 root 20 0 29148 3288 2664 R 0.1 0.0 0:00.02 top
  628 syslog 20 0 255864 3164 1916 S 0.0 0.0 0:00.59 rsyslogd
    1 root 20 0 29048 3160 1936 S 0.0 0.0 0:04.24 init
 3038 root 20 0 36764 3028 2712 S 0.0 0.0 0:00.03 systemd-logind
  698 message+ 20 0 43992 2644 2156 S 0.0 0.0 0:00.12 dbus-daemon
  405 root 20 0 23704 2...

Revision history for this message
Markus Wagner (markus-wagner) wrote :
Download full text (10.0 KiB)

top after unmounting the big ramdisk:

top - 22:39:08 up 1 day, 13:31, 1 user, load average: 0.05, 0.03, 0.05
Tasks: 245 total, 1 running, 244 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 49452964 total, 20996188 used, 28456776 free, 19580 buffers
KiB Swap: 50320380 total, 15316644 used, 35003736 free. 19612676 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  805 root 20 0 52780 13564 3540 S 0.0 0.0 0:05.05 munin-node
 3351 root 20 0 99532 6456 5480 S 0.0 0.0 0:00.05 sshd
 3385 root 20 0 26788 5416 3376 S 0.0 0.0 0:00.09 bash
 1496 root 20 0 123748 3976 3376 S 0.0 0.0 0:06.23 thermald
 1243 root 20 0 10272 3928 1620 S 0.0 0.0 0:00.34 dhclient
 1532 root 20 0 59580 3672 2984 S 0.0 0.0 0:00.01 sshd
 1679 ntp 20 0 31488 3336 2728 S 0.0 0.0 0:06.37 ntpd
 4074 root 20 0 29148 3224 2600 R 0.1 0.0 0:00.02 top
  628 syslog 20 0 255864 3164 1916 S 0.0 0.0 0:00.59 rsyslogd
    1 root 20 0 29048 3160 1936 S 0.0 0.0 0:04.24 init
 3038 root 20 0 36764 3028 2712 S 0.0 0.0 0:00.03 systemd-logind
  698 message+ 20 0 43992 2644 2156 S 0.0 0.0 0:00.12 dbus-daemon
  405 root 20 0 23704 ...

Revision history for this message
Markus Wagner (markus-wagner) wrote :

root@cx1000-02:~# free
             total used free shared buffers cached
Mem: 49452964 20993700 28459264 9296160 19588 19612676
-/+ buffers/cache: 1361436 48091528
Swap: 50320380 15316644 35003736

Revision history for this message
Markus Wagner (markus-wagner) wrote :
Download full text (10.1 KiB)

top directly after reboot (of another, identical system):

top - 22:51:44 up 2 min, 1 user, load average: 0.53, 0.72, 0.32
Tasks: 258 total, 1 running, 257 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 49453024 total, 861076 used, 48591948 free, 30064 buffers
KiB Swap: 50319356 total, 0 used, 50319356 free. 68324 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 1924 root 1 -19 69752 19028 4488 S 0.0 0.0 0:00.25 bs_worker
 1918 root 1 -19 69732 18948 4408 S 0.0 0.0 0:00.27 bs_worker
 1910 root 1 -19 69732 18928 4396 S 0.0 0.0 0:00.22 bs_worker
 1911 root 1 -19 69752 18924 4400 S 0.0 0.0 0:00.25 bs_worker
 1926 root 1 -19 69752 18872 4400 S 0.0 0.0 0:00.23 bs_worker
 1919 root 1 -19 69752 18812 4416 S 0.0 0.0 0:00.24 bs_worker
 1915 root 1 -19 69732 18780 4388 S 0.0 0.0 0:00.22 bs_worker
 1927 root 1 -19 69752 18772 4284 S 0.0 0.0 0:00.23 bs_worker
 1921 root 1 -19 69752 18756 4268 S 0.0 0.0 0:00.24 bs_worker
 1913 root 1 -19 69752 18736 4280 S 0.0 0.0 0:00.25 bs_worker
 1923 root 1 -19 69732 18736 4268 S 0.0 0.0 0:00.24 bs_worker
 1929 root 1 -19 69732 18736 4284 S 0.0 0.0 0:00.27 bs_worker
 1917 root 1 -...

penalvch (penalvch)
tags: added: bios-outdated-6.00.1.13
Revision history for this message
Markus Wagner (markus-wagner) wrote :

bios of worker 03 has been updated to 6.00.1.13:

[ 0.000000] DMI: FUJITSU PRIMERGY CX122 S1 /D2899, BIOS 6.00 Rev. 1.13.2899.A1 01/19/2012

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Markus Wagner (markus-wagner) wrote :

the upgraded bios didn't helped, system just starts to swap again

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Does this issue go away if you boot back into the trusty kernel?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18-vivid/

Revision history for this message
Markus Wagner (markus-wagner) wrote :

the trusty kernel didn't helped at all.

worker 03 has been updated to mainline kernel:

root@cx1000-03:~# uname -a
Linux cx1000-03 3.18.0-031800-generic #201412071935 SMP Mon Dec 8 00:36:34 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Markus Wagner (markus-wagner) wrote :

the bug also occurs with the upstream kernel

tags: added: kernel-bug-exists-upstream
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

So booting the Trusty kernel on a Utopic install does not fix the issue? That may indicate a Userspace issue. In the bug description, you say: "Since the upgrade from trusty to utopic at October 27." Would you be able to re-install Trusty on a machine and confirm that it does not exhibit the bug? If it does not, we may wan to try booting a Utopic kernel on a Trusty install.

Revision history for this message
Markus Wagner (markus-wagner) wrote :

worker 01 has been reinstalled with trusty

Revision history for this message
Markus Wagner (markus-wagner) wrote :

short summery:

worker01: trusty, unmodified => no problems
worker02: utopic, patched perl => massive swap use
worker03: utopic, mainline kernel (3.18.2-031802-generic #201501082011) => massive swap use

what should we do next? boot the utopic or mainline kernel on trusty?

Revision history for this message
Markus Wagner (markus-wagner) wrote :

worker 01 now has trusty with a 3.19.0 mainline kernel:

root@cx1000-01:~# cat /etc/issue
Ubuntu 14.04.1 LTS \n \l

root@cx1000-01:~# uname -a
Linux cx1000-01 3.19.0-031900rc4-generic #201501112135 SMP Sun Jan 11 21:36:48 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
root@cx1000-01:~#

Revision history for this message
Markus Wagner (markus-wagner) wrote :

trusty + mainline kernel works like a charm, worker 2 and 3 are swapping again...

Revision history for this message
Marcus Klein (kleini76) wrote :

So it must be some setting within the system that affects this behavior. Any hints from the Ubuntu guys what we should look for?

Revision history for this message
Marcus Klein (kleini76) wrote :

Ping. Do you have any hints where to look for the cause of this very different caching behaviour of the latest mainline kernel?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest 3.13 upstream stable kernel?
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11-ckt14-trusty/

This will tell us if the fix in mainline already made it's way into stable updates.

Revision history for this message
Marcus Klein (kleini76) wrote :

Downgraded kernel on one machine with Utopic to kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11-ckt14-trusty/
The machine with Trusty installed runs now for 6 days without swapping and same load as all other machines.
Rebooted the second machine with Utopic to have an indicator when swapping starts again. Will have a look then on above mentioned machine with downgraded kernel if it swaps, too.

Revision history for this message
Marcus Klein (kleini76) wrote :

Now we have:
- worker1: Trusty with Trusty kernel, no swapping
- worker2: Utopic with Kernel 3.13.11-031311ckt14-generic, swapping
- worker3: Utopic with Kernel 3.18.2-031802-generic, swapping

So we still encounter the swapping problem with Utopic installations no matter, which kernel is installed
- Trusty kernel
- Kernel 3.13.11-031311ckt14-generic
- Kernel 3.18.2-031802-generic

Again my question: What can be different in Utopic installation causing this swapping kernel behavior? Is there any tooling that allows comparing kernel parameters?

Revision history for this message
Marcus Klein (kleini76) wrote :

During calender weeks 6,7,8 and 9 this problem did not occur anymore. Strange enough...

Changed in linux (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.