kswapd0 100% CPU usage

Bug #1518457 reported by Sam Lade on 2015-11-20
650
This bug affects 131 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
linux (Ubuntu)
High
Dan Streetman
Xenial
High
Dan Streetman
Yakkety
High
Dan Streetman

Bug Description

As per bug 721896 and various others:

I'm on an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory). Occasionally (about once a day), kswapd0 falls into a busy loop and spins on 100% CPU usage indefinitely. This can be provoked by copying/writing large files (e.g. dding a 256MB file), but it happens occasionally otherwise. System memory usage (not including buffers/caches) currently sits at 36%, which is typical[1]. Initially I had no swap space configured; I've since tried enabling a 256MB swap file, but the problem continues to occur and no swap space is used. The system can be recovered with `echo 1 > /proc/sys/vm/drop_caches`.

Happy to provide further information/take further debugging actions.

[1] Full output from `free`:
             total used free shared buffers cached
Mem: 1014936 483448 531488 28556 9756 112700
-/+ buffers/cache: 360992 653944
Swap: 262140 0 262140

ProblemType: Bug
DistroRelease: Ubuntu 15.10
Package: linux-image-4.2.0-18-generic 4.2.0-18.22
ProcVersionSignature: Ubuntu 4.2.0-18.22-generic 4.2.3
Uname: Linux 4.2.0-18-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Nov 19 19:40 seq
 crw-rw---- 1 root audio 116, 33 Nov 19 19:40 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.19.1-0ubuntu5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
Date: Fri Nov 20 20:44:30 2015
Ec2AMI: ami-1c552a76
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1d
Ec2InstanceType: t2.micro
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
MachineType: Xen HVM domU
PciMultimedia:

ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 xen
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-18-generic root=UUID=35bc01f4-4602-4823-976e-508edef899df ro console=tty1 console=ttyS0 net.ifnames=0
RelatedPackageVersions:
 linux-restricted-modules-4.2.0-18-generic N/A
 linux-backports-modules-4.2.0-18-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/06/2015
dmi.bios.vendor: Xen
dmi.bios.version: 4.2.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd05/06/2015:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.2.amazon
dmi.sys.vendor: Xen

CVE References

Sam Lade (sam-sentynel) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
mecat (habdankm) wrote :

same issue here:
root@orangepi:/var/log# uname -a
Linux orangepi 3.4.39 #2 SMP PREEMPT Mon Oct 12 12:03:03 CEST 2015 armv7l armv7l armv7l GNU/Linux
root@orangepi:/var/log# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.10
DISTRIB_CODENAME=wily
DISTRIB_DESCRIPTION="Ubuntu 15.10"
also
echo 1 > /proc/sys/vm/drop_caches
temporary solve issue

Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.4 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc2+cod1-wily/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key
Sam Lade (sam-sentynel) wrote :

This was a clean build, so I don't have any information about previous versions unfortunately. (The previous server, which didn't have this issue, was different AWS hardware and the previous Ubuntu version.)

I've tested with the latest mainline kernel and this is still occurring.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Sean Groarke (sgroarke) wrote :

Pretty much same description here. Started when I upgraded Amazon instance to 15.10.

Causing a lot of disruption - available to test also if it helps move us forward.

Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

4.0 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-vivid/
4.1 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.1-wily/
4.2 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2-wily/

You don't have to test every kernel, just up until the kernel that first has this bug. We can then narrow down further by testing some release candidates.

Thanks in advance!

Changed in linux (Ubuntu):
importance: Medium → High
Sam Lade (sam-sentynel) wrote :

Okay, I cloned my server and tried kernel versions. The latest version which does _not_ exhibit the issue is 3.12.51. The first which does is 3.13-rc1.

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing, Sam. Could you also test the 3.12 final version, since 3.13-rc1 is the next linear version after 3.12 final. The kernel can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.12-trusty/

Sam Lade (sam-sentynel) wrote :

3.12 final doesn't exhibit the issue either.

Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v3.12 final and v3.13-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
5cbb3d216e2041700231bcfc383ee5f8b7fc8b74

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

No bug on that version.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
e1f56c89b040134add93f686931cc266541d239a

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

No bug on that version.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
9073e1a804c3096eda84ee7cbf11d1f174236c75

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

Bug is present in this version.

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Triaged → In Progress
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
ab0169bb5cc4a5c86756dde662087f9d12302eb0

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

No bug in that version.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
f080480488028bcc25357f85e8ae54ccc3bb7173

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

Bug is present in this version.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
b746f9c7941f227ad582b4f0bc981f3adcbc46b2

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

No bug in that version.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
72c1253574a1854b0b6f196e24cd0dd08c1ad9b9

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :
Download full text (3.3 KiB)

It's crashing on boot with this version. It's related to paging, so it might be relevant to the issue, so I've attached the full dmesg and here's the actual crash:

[ 3.716345] BUG: unable to handle kernel paging request at 000060ffc0002370
[ 3.720056] IP: [<ffffffff811a6ae1>] mem_cgroup_move_account+0xd1/0x250
[ 3.720056] PGD 0
[ 3.720056] Oops: 0000 [#1] SMP
[ 3.720056] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_recent xt_conntrack nf_conntrack iptable_filter ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy pata_acpi
[ 3.720056] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0-031200rc2-generic #201512161751
[ 3.720056] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[ 3.720056] Workqueue: events css_killed_work_fn
[ 3.720056] task: ffff88003da946b0 ti: ffff88003daac000 task.ti: ffff88003daac000
[ 3.720056] RIP: 0010:[<ffffffff811a6ae1>] [<ffffffff811a6ae1>] mem_cgroup_move_account+0xd1/0x250
[ 3.720056] RSP: 0000:ffff88003daadcc8 EFLAGS: 00010046
[ 3.720056] RAX: 0000000000000246 RBX: ffff88003d803a60 RCX: 000000000000053e
[ 3.720056] RDX: 000060ffc0002358 RSI: 0000000000000001 RDI: ffff88003c4e822c
[ 3.720056] RBP: ffff88003daadd20 R08: ffff88003cc55000 R09: 0000000000000004
[ 3.720056] R10: ffff88003c4e8000 R11: 0000000000000001 R12: 0000000000000000
[ 3.720056] R13: ffffea0000e0e980 R14: ffff88003c4e8000 R15: 0000000000000001
[ 3.720056] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 3.720056] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.720056] CR2: 000060ffc0002370 CR3: 0000000036754000 CR4: 00000000001406f0
[ 3.720056] Stack:
[ 3.720056] ffffffff811a7bc8 ffff88003fffb780 ffffea0000e0e980 ffff88003cc55000
[ 3.720056] ffff88003c4e8000 ffff88003c4e822c ffff880036c1da00 ffffea0000e0e980
[ 3.720056] ffff88003fffbcc0 ffff88003d803a60 ffffea0000e0e9a0 ffff88003daadda8
[ 3.720056] Call Trace:
[ 3.720056] [<ffffffff811a7bc8>] ? mem_cgroup_page_lruvec+0x28/0x90
[ 3.720056] [<ffffffff811a8427>] mem_cgroup_reparent_charges+0x257/0x460
[ 3.720056] [<ffffffff811a87df>] mem_cgroup_css_offline+0xaf/0x220
[ 3.720056] [<ffffffff810de897>] offline_css+0x27/0x50
[ 3.720056] [<ffffffff810e199d>] css_killed_work_fn+0x2d/0xa0
[ 3.720056] [<ffffffff81081032>] process_one_work+0x182/0x450
[ 3.720056] [<ffffffff81081dc1>] worker_thread+0x121/0x410
[ 3.720056] [<ffffffff81081ca0>] ? rescuer_thread+0x3d0/0x3d0
[ 3.720056] [<ffffffff81088ba0>] kthread+0xc0/0xd0
[ 3.720056] [<ffffffff81088ae0>] ? kthread_create_on_node+0x120/0x120
[ 3.720056] [<ffffffff816ff4fc>] ret_from_fork+0x7c/0xb0
[ 3.720056] [<ffffffff81088ae0>] ? kthread_create_on_node+0x120/0x120
[ 3.720056] Code: d6 00 55 00 4d 85 e4 4c 8b 55 c8 4c 8b 45 c0 0f 85 a5 00 00 00 41 8b 55 18 85 d2 0f 88 99 00 00 00 49 8b 96 30 02 00 00 45 89 fb <4c> 39 5a 18 0f 8c c2 00 00 00 44 89 f9 f7 d9 89 ce 65 48 01 72
[ 3.720056] RIP [<ffffffff811a6ae1>] mem_cg...

Read more...

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing. I skipped that commit in the bisect, in case it's not related to the bug.

I built the next test kernel, up to the following commit:
cbbc58d4fdfab1a39a6ac1b41fcb17885952157a

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

Same crash on that version.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
3b7834743f9492e3509930feb4ca47135905e640

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

That version has also crashed.

mm (mtl-0) wrote :

This bug is also affecting me on 2 (ident) Xubuntu 15.10 systems:

uname -a: ### 4.2.0-22-generic #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
 cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.10
DISTRIB_CODENAME=wily
DISTRIB_DESCRIPTION="Ubuntu 15.10"

also
echo 1 or echo 3 > /proc/sys/vm/drop_caches
temporary solves the issue

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
d7876f1be40a16223a44355740de625849504eb5

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

Crashed again.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
732e563373ffc57d38a8a3b6d55f2de865182117

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

Crashed again.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
56aba608257b451f663d25313d5ecae134d5557f

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

Crashed again.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
59ab5a8f4445699e238c4c46b3da63bb9dc02897

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

Crashed again.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
98fda169290b3b28c0f2db2b8f02290c13da50ef

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Sam Lade (sam-sentynel) wrote :

Crashed again.

Changed in linux (Ubuntu):
status: In Progress → Confirmed
assignee: Joseph Salisbury (jsalisbury) → nobody
129 comments hidden view all 209 comments

(In reply to Anatoli Sakhnik from comment #13)
> Mine is 2G. I didn't change anything in the kernel source code, but switched
> off many options in the config file:
> https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 .
>
> Even today, if I boot stock arch kernel, the bug regresses; if I boot
> linux-c720, kswapd0 is still. In theory, I could experiment with different
> configurations in between stock's and mine to triage the issue.

could you please share your configuration for the kernel so I can try your AUR package and solve this issue once for all :) ? thanks in advance

We encounter this regularly on AWS, but only on t2.small instances, which indeed are the only ones we run which have 2GB of RAM.

We use the latest Ubuntu 15.10 AMIs as found here https://cloud-images.ubuntu.com/locator/ec2/. Please let me know if we can do anything to help track this down.

The workaround suggested above (echo 3 > /proc/sys/vm/drop_caches) doesn't work consistently for me on kernel 4.2.0 (Ubuntu 15.10) on an Acer C720 Chromebook.

I've found another workaround that works well for me so far: create a file /etc/sysctl.d/60-workaround-kswapd-allcpu.conf with the following contents and reboot:
vm.min_free_kbytes=67584

The idea behind this workaround is a post by Kirill A. Shutemov on LKML (http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html) and this Gallium OS bug report: https://github.com/GalliumOS/galliumos-distro/issues/52

Would be interesting to know if this helps others

Download full text (6.6 KiB)

Same problem here:
- No swap machine
- Wily (U15.10) - 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
- 1GB RAM

- `meminfo` - Should have enough RAM to not swap though buffers do seem high

MemTotal: 1014932 kB
MemFree: 231296 kB
MemAvailable: 871180 kB
Buffers: 580684 kB
Cached: 47812 kB
SwapCached: 0 kB
Active: 547952 kB
Inactive: 164364 kB
Active(anon): 84280 kB
Inactive(anon): 4288 kB
Active(file): 463672 kB
Inactive(file): 160076 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 224 kB
Writeback: 0 kB
AnonPages: 83800 kB
Mapped: 39688 kB
Shmem: 4768 kB
Slab: 48008 kB
SReclaimable: 31172 kB
SUnreclaim: 16836 kB
KernelStack: 1936 kB
PageTables: 3844 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 507464 kB
Committed_AS: 314640 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 13524 kB
VmallocChunk: 34359717628 kB
HardwareCorrupted: 0 kB
AnonHugePages: 49152 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 53248 kB
DirectMap2M: 1126400 kB

- kernel config: https://gist.github.com/sgnn7/cbb41ce21d3a927eca27

- strace shows nothing interesting

- `perf` report:
Samples: 12K of event 'cpu-clock', Event count (approx.): 3245250000
Overhead Command Shared Object Symbol
  19.34% kswapd0 [kernel.kallsyms] [k] shrink_lruvec
  17.04% kswapd0 [kernel.kallsyms] [k] mem_cgroup_iter
   8.60% kswapd0 [kernel.kallsyms] [k] mem_cgroup_zone_lruvec
   6.57% kswapd0 [kernel.kallsyms] [k] shrink_slab
   5.47% kswapd0 [kernel.kallsyms] [k] global_dirty_limits ...

Read more...

Cont'd from previous post

In order of attempts on a live system:
- gdb didn't work at all since kernel wasn't built w/ debugging flags
- hotload of 10 and 0 swappiness (from 60) didn't make the kswapd process reduce cpu usage
- hotload of vm.min_free_kbytes=64K (from 4K) didn't make the process reduce cpu usage
- hotload of vm.dirty_background_ratio=5 (from 10) didn't make the process reduce cpu usage
- hotload of vm.dirty_ratio=10 (from 20) didn't make the process reduce cpu usage
- hotload of vm.dirty_background_ratio=15 (from 5) didn't make the process reduce cpu usage
- hotload of vm.dirty_ratio=25 (from 10) didn't make the process reduce cpu usage
- live swapon on a new 256MB swapfile didn't reduce process use
- live swapoff and swapon after that also didn't drop cpu usage

Sidenote: We're using Docker so I'm not sure if that is contributing to the situation.

Good news! I was able to get rid of the bug completely by setting the `mem` kernel parameter to a value slightly less than physical memory. I own an Acer C720 (2GB model), and setting `mem=1920M` does the job.

The idea sprung up in my head after reading the aforementioned bug report on github[1]. I hope this might give some clue to the issue.

[1]: https://github.com/GalliumOS/galliumos-distro/issues/52

Created attachment 208411
ftrace (function_graph)

Created attachment 208421
ftrace (vmscan tracepoints)

Created attachment 208431
/proc/vmstat (time 0)

Created attachment 208441
/proc/vmstat (time 5s)

Created attachment 208451
/proc/zoneinfo

Created attachment 208461
/proc/pagetypeinfo

Created attachment 208471
/proc/buddyinfo

Created attachment 208481
vmstat -m (time 0)

Created attachment 208491
vmstat -m (time 5s)

I am able to semi-reliably reproduce this (or very similar?) problem on a setup very close to one in comment #21

- kernel: 4.2.0-30-generic (ubuntu 15.10)
- 2 GB RAM, 1 CPU, running under Xen (EC2 t2.small instance)
- docker with LVM thin-pool storage backend, running 3 containers, no memory limits set for their memcg's
- server is mostly idling (load average 0.0-0.1)

To reproduce it I have to:

1. set vm.overcomit_memory=1
2. initiate some disk activity:
     find -xdev / -type f |xargs -P10 -n1 md5sum &>/dev/null &
     find /var/lib/docker -type f |xargs -P10 -n1 md5sum &>/dev/null &

3. run some memory allocations until you hit OOM
    for x in {1..200}; do ./memalloc & : ; done

memalloc above is a simple C program which allocates 100MB and memsets it with 'x':

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
  int block_mb = 100;
  char *buf;

  printf("allocing %dMB: ", block_mb);
  buf = malloc(block_mb * 1024 * 1000);
  if (! buf) {
    printf("FAILED!\n");
    exit(EXIT_FAILURE);
  }
  printf("ok\n");
  memset(buf, 'x', block_mb * 1024 * 1000);
  sleep(180);
  return 0;
}

once you hit OOM, console slows down, it is time to CTRL+C, pkill memalloc and then check top. many times it spins `kswapd0` then recovers within tens of seconds, but once in a while it stays there for hours (didn't have patience to check for longer).

Once I triggered bug, I tried to get as much information as possible from running system. I am attaching /proc/*info files (some taken 5 s apart), ftrace outputs for event tracer (vmscan events only), ftrace output for function_graph tester. Let me know if you need more information.

To recover from situation need to free enough memory in a short period of time, sometime dropping caches helps, sometimes needed to close applications/containers as well, but never had to reboot to recover.

It would be very helpful if there was a way to get output similar to ftrace function_graph tracer, but with function args and return values, but from the look of it, `pgdat_balance` for some reason keeps returning false even that /proc/zoneinfo shows that number of free pages is much higher than any watermark.

Problem description and recovery method very closely resembles discussion around kernel 3.7 (https://lkml.org/lkml/2012/11/28/88):

> The zonelist reclaim in kswapd would do
> nothing because all high watermarks are met, but the compaction logic
> would find its own requirements unmet and loop over the zones again.
> Indefinitely, until some third party would free enough memory to help
> meet the higher compaction watermark.

(In reply to Anatoli Sakhnik from comment #4)
> My Acer C720 too suffers occasionally. Turning swap on/off doesn't help.
> Dropping caches *does* help:
>
> # echo 3 > /proc/sys/vm/drop_caches # 1 isn't enough
>
> Next my guess would be to try to deactivate zswap.

above work around works for me, kernel 4.4.2 debian jessie.

bug happens randomly after heavy web browsers for kernel 4.5
downgrade to 3.16 stable jessie kernel, bug gone.
upgrade 4.4.2 bug came again

Same thing on Thinkpad X220 with 8 GB RAM running Ubuntu 14.04, with Ubuntu's Kernel 3.16.0-77-generic.

Swap is disabled.

kswapd0 runs on high CPU and the HD light is on all the time during this (no idea why).

After 20 (!) minutes the OOM killer manages to kill a process to resolve the situation.

Same problem on Amazon's t2.nano instance (512MB of RAM). Seemed to be triggered by doing a bunch of file IO. This is a brand new install of Ubuntu 16.04. I have no swap enabled, and yet:

top - 06:42:57 up 1:58, 1 user, load average: 2.43, 2.66, 2.31
Tasks: 125 total, 3 running, 122 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.1 us, 6.9 sy, 0.0 ni, 0.0 id, 0.9 wa, 0.0 hi, 0.0 si, 90.1 st
KiB Mem : 498416 total, 348096 free, 49772 used, 100548 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 411900 avail Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
   29 root 20 0 0 0 0 R 65.0 0.0 103:16.64 kswapd0
14343 root 20 0 0 0 0 R 2.9 0.0 0:00.82 python

Running "echo 1 > /proc/sys/vm/drop_caches" didn't fix the problem, but it did fix it immediately with "3".

Also, my /tmp isn't full at all (6.5GB / 85% left on root).

A workaround for machines running under Xen has been found over on Ubuntu's bug tracker, see comment #69:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457

The workaround is to disable hot-add of memory:

touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot

I tried the same Ubuntu inspired "disable hot-add of memory" (and CPU) workaround under AWS EC2 HVM, Centos 7.x with mainline (elrepo) 4.4.15 kernel: no such luck, I still see this occasionally.

Dan Streetman (ddstreet) on 2016-10-01
Changed in linux (Ubuntu):
assignee: nobody → Dan Streetman (ddstreet)

I detailed why this bug happens here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126

this appears to be fixed by Mel Gorman's patch series to change memory reclaim from "per zone" to "per node":
https://marc.info/?l=linux-mm&m=146797052519026

So this bug should be fixed with the latest kernel.

(In reply to Dan Streetman from comment #40)
> I detailed why this bug happens here:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126
>
> So this bug should be fixed with the latest kernel.

Can you clarify, the link you mention seems to talk mainly about Xen. Do you think the latest kernel will fix it also for non-Xen machines?

(In reply to mail+kernel-bugzilla from comment #41)
> (In reply to Dan Streetman from comment #40)
> > I detailed why this bug happens here:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126
> >
> > So this bug should be fixed with the latest kernel.
>
> Can you clarify, the link you mention seems to talk mainly about Xen. Do you
> think the latest kernel will fix it also for non-Xen machines?

what does your /proc/zoneinfo look like? do you have a system with (approx) <= 4g and Normal zone with few managed pages?

(In reply to Dan Streetman from comment #42)
> what does your /proc/zoneinfo look like? do you have a system with (approx)
> <= 4g and Normal zone with few managed pages?

My zoneinfo file right now looks like this: https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94

(I upgraded from 8 GB to 16 GB memory recently though, after I wrote comment #36.)

(In reply to mail+kernel-bugzilla from comment #43)
> (In reply to Dan Streetman from comment #42)
> > what does your /proc/zoneinfo look like? do you have a system with
> (approx)
> > <= 4g and Normal zone with few managed pages?
>
> My zoneinfo file right now looks like this:
> https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94
>
> (I upgraded from 8 GB to 16 GB memory recently though, after I wrote comment
> #36.)

That zoneinfo doesn't look like you're seeing the same problem, so if you are seeing consistent, sustained (not just transient) 100% cpu from kswapd, I think it's a different problem from what I described in comment 40.

Seth Forshee (sforshee) on 2016-10-12
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → Dan Streetman (ddstreet)
Seth Forshee (sforshee) on 2016-10-12
Changed in linux (Ubuntu Yakkety):
status: Confirmed → Fix Committed
Andy Whitcroft (apw) on 2016-10-13
Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Invalid
Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released

I'm assuming by latest kernel you mean 4.8? If so I'm looking forward to Arch pushing it through testing :)

Seth Forshee (sforshee) on 2016-10-18
tags: added: verification-needed-yakkety
tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Changed in linux (Ubuntu Yakkety):
status: Invalid → Fix Released
status: Invalid → Fix Released

I am having the same issue on Fedora 24 with kernel 4.8.6. So I guess it has not been pushed there, or it does not fix anything.
It is a huge job stopper as I need to transfer many files between two USB disks.
Kwapd0 appears on top of processes after a while, and slowly degrades overall performance until I have to hard reboot the machine in the middle of some transfer.

My guess is Fedora didn't put the changes through or something, because 4.8 has DEFINITELY fixed it for me. I used to have to reboot about twice daily due to this, but ever since I upgraded to 4.8 it hasn't happened once.

I'm on openSUSE with 4.8.8 and still have this issue.

Changed in linux (Ubuntu):
status: Invalid → Fix Released

I'm on Debian with 4.8.7 and still have this issue.

4.8.13-100.fc23.i686+PAE #1
/dev/sda is Samsung SSD 850 EVO 250GB

swapoff -va
sysctl vm.drop_caches=3

Problem, causes always heavy kswapd0 load:
  cat /dev/sda >> /dev/zero
  hdparm -t /dev/sda
  ddrescue /dev/sda /dev/zero -vf
  hexdump /dev/sda
  dd if=/dev/sda of=/dev/zero
  etc.

No problem (read speed ~500MB/s, except hdparm ):
  hdparm --direct -t /dev/sda
  dd iflag=direct if=/dev/sda of=/dev/zero bs=1073741824
  ddrescue --direct /dev/sda /dev/zero -vf -b 4096 -c 8192

I am not sure if this is the same bug, but for me kswapd0 goes high-cpu following a page allocation failure in xhci_segment_alloc and I think that this has been occurring since moving to 4.8 on Fedora 24. I don't remember experiencing it before that. Currently on 4.8.15.

I normally boot with 3 or 4 USB 3.0 disks attached and, after the upgrade to 4.8.x noticed that kswapd0 was running at 100%. I went back to 4.7.x and no problem. Searches on this issue frequently referred to USB disks so I unplugged and rebooted.

If I unplug all of my USB 3.0 devices I get a normal boot, even with a USB weather station, keyboard, mouse. Sometimes, one or two USB 3.0 disks is OK too, If I boot with all of the USB 3.0 disks included, I get a kworker page allocation failure and after boot kswapd0 is high-cpu, usually split across 2-4 cores.

If I boot with two USB 3.0 disks and get a normal boot (no page allocation failure and normal kswapd) and then plug in a hub with the rest of the disks (and a USB 3.0 card reader) I get the page allocation failure at that point and kswapd0 goes high-cpu.

I have not looked at them all, but whenever I see kswapd0 high-cpu and I do look, there is the page allocation failure in the log.

The 'perf top' command seems to show different information from time to time but the top contenders are frequently 'shrink_inactive_list', 'inactive_list_is_low', 'find_next_bit', 'shrink_none_memcg', '_raw_spin_lock' to name a few.

Makes me wonder if the xhci allocation failure is the trigger, and fails to clean up on the error exit path, and kswapd0 is just a hapless victim. There is a stack trace (on ubuntu kernel) of the page allocation failure in the dmesg attached to https://bugzilla.redhat.com/show_bug.cgi?id=1395825 on this issue but I have more if it would help.

I have 19GiB free on a 24GiB machine so there should be no memory shortage to prompt swapping or the page allocation failure.

I had also noticed frequently that not all of my USB disks were mounted after boot and that I had to remove and reinsert a disk to use it. IIRC this affected my USB 2.0 disks too and from before the upgrade to 4.8 too.

> Problem, causes always heavy kswapd0 load:
> cat /dev/sda >> /dev/zero
> hdparm -t /dev/sda
> ddrescue /dev/sda /dev/zero -vf
> hexdump /dev/sda
> dd if=/dev/sda of=/dev/zero
> etc.

of course those cause kswapd work, all those commands will fill your page cache and kswapd is responsible for clearing those pages out.

kswapd running isn't a problem, if it's doing work. kswapd running *without* doing work is the problem. When you stop running those commands, does kswapd catch up and stop using cpu? If so, that's normal. If not, and it never stops using cpu, that's the problem.

> No problem (read speed ~500MB/s, except hdparm ):
> hdparm --direct -t /dev/sda
> dd iflag=direct if=/dev/sda of=/dev/zero bs=1073741824
> ddrescue --direct /dev/sda /dev/zero -vf -b 4096 -c 8192

the difference is those commands bypass the page cache - so the page cache doesn't fill up and kswapd doesn't need to clear it out.

> I am not sure if this is the same bug, but for me kswapd0 goes high-cpu
> following a page allocation failure in xhci_segment_alloc and I think that
> this has been occurring since moving to 4.8 on Fedora 24

from your dmesg, it certainly doesn't look like the same bug.

(In reply to Dan Streetman from comment #52)

> of course those cause kswapd work, all those commands will fill your page
> cache and kswapd is responsible for clearing those pages out.
>
> kswapd running isn't a problem, if it's doing work. kswapd running
> *without* doing work is the problem. When you stop running those commands,
> does kswapd catch up and stop using cpu? If so, that's normal. If not, and
> it never stops using cpu, that's the problem.

but, why kswapd so aggressively write something to storage when no data to flush (swap not set)?

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed

I reproduced the bug on the most recent kernel. I have extracted sysctl, meminfo and dmesg logs: please see my comments and attachments on the same bug: https://bugzilla.kernel.org/show_bug.cgi?id=110501#c15
I also wrote simple python script that eats ram and reproduces the bug 100% for me

Petr Doležal (elkir) wrote :

Recently got a new laptop with a fresh install of Ubuntu 18.04 and now updated to kernel 4.19.21-041921-generic.

It still happens even with 16 GB of memory and 4GB more in swap on SSD. What is more dangerous is now that laptop has hyperthreading it goes absolutely crazy and heats up insanely. At that point, the system is so frozen no commands can fix it as one never gets to write them. Restarting the machine with a button is the only option.

Displaying first 40 and last 40 comments. View all 209 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.