Bug #1518457 “kswapd0 100% CPU usage” : Yakkety (16.10) : Bugs : linux package : Ubuntu

Revision history for this message

In Linux Kernel Bug Tracker #65201, nleo (nleo-linux-kernel-bugs) wrote on 2013-11-19:

#153

kswapd0 randomly load one core of CPU by 100%

Linux localhost 3.12.0-1-ARCH #1 SMP PREEMPT Wed Nov 6 09:06:27 CET 2013 x86_64 GNU/Linux

No swap enabled

Befor on same laptop was installed Ubuntu 12.04 and kernel 3.2 32-bit pae, and there is no such problem.

[root@localhost ~]# free -mh
total used free shared buffers cached
Mem: 3.8G 2.4G 1.3G 0B 150M 508M
-/+ buffers/cache: 1.8G 2.0G
Swap: 0B 0B 0B

[root@localhost ~]# cat /proc/meminfo
MemTotal: 3935792 kB
MemFree: 1381360 kB
Buffers: 154216 kB
Cached: 533096 kB
SwapCached: 0 kB
Active: 1958896 kB
Inactive: 438004 kB
Active(anon): 1740916 kB
Inactive(anon): 136292 kB
Active(file): 217980 kB
Inactive(file): 301712 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 2064 kB
Writeback: 0 kB
AnonPages: 1709628 kB
Mapped: 196696 kB
Shmem: 167620 kB
Slab: 81516 kB
SReclaimable: 61312 kB
SUnreclaim: 20204 kB
KernelStack: 1696 kB
PageTables: 13088 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1967896 kB
Committed_AS: 3498576 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 361304 kB
VmallocChunk: 34359300731 kB
HardwareCorrupted: 0 kB
AnonHugePages: 157696 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 18476 kB
DirectMap2M: 4059136 kB

And I can't kill it. I heared that it's not good idea, but just for lulz)

Revision history for this message

In Linux Kernel Bug Tracker #65201, atomlin (atomlin-linux-kernel-bugs) wrote on 2013-11-20:

#154

(In reply to nleo from comment #0)
> kswapd0 randomly load one core of CPU by 100%

You cannot issue a SIGKILL to 'kswapd' since it is
a kernel thread.

> CommitLimit: 1967896 kB
> Committed_AS: 3498576 kB
^^^^^^^

Seem to be over committing memory.

Revision history for this message

In Linux Kernel Bug Tracker #65201, akpm (akpm-linux-kernel-bugs) wrote on 2013-11-22:

#155

(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 19 Nov 2013 19:40:40 +0000 <email address hidden> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=65201
>
> Bug ID: 65201
> Summary: kswapd0 randomly high cpu load
> Product: Memory Management
> Version: 2.5
> Kernel Version: 3.12
> Hardware: x86-64
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Other
> Assignee: <email address hidden>
> Reporter: <email address hidden>
> Regression: No
>
> kswapd0 randomly load one core of CPU by 100%
>
> Linux localhost 3.12.0-1-ARCH #1 SMP PREEMPT Wed Nov 6 09:06:27 CET 2013
> x86_64
> GNU/Linux
>
> No swap enabled
>
> Befor on same laptop was installed Ubuntu 12.04 and kernel 3.2 32-bit pae,
> and
> there is no such problem.
>
> [root@localhost ~]# free -mh
> total used free shared buffers cached
> Mem: 3.8G 2.4G 1.3G 0B 150M 508M
> -/+ buffers/cache: 1.8G 2.0G
> Swap: 0B 0B 0B

hm, I wonder what kswapd is up to.

Could you please make it happen again and then

dmesg -n 7
dmesg -c
echo m > /proc/sysrq-trigger
echo t > /proc/sysrq-trigger
dmesg -s 1000000 > foo

then send us foo?

>
> [root@localhost ~]# cat /proc/meminfo
> MemTotal: 3935792 kB
> MemFree: 1381360 kB
> Buffers: 154216 kB
> Cached: 533096 kB
> SwapCached: 0 kB
> Active: 1958896 kB
> Inactive: 438004 kB
> Active(anon): 1740916 kB
> Inactive(anon): 136292 kB
> Active(file): 217980 kB
> Inactive(file): 301712 kB
> Unevictable: 0 kB
> Mlocked: 0 kB
> SwapTotal: 0 kB
> SwapFree: 0 kB
> Dirty: 2064 kB
> Writeback: 0 kB
> AnonPages: 1709628 kB
> Mapped: 196696 kB
> Shmem: 167620 kB
> Slab: 81516 kB
> SReclaimable: 61312 kB
> SUnreclaim: 20204 kB
> KernelStack: 1696 kB
> PageTables: 13088 kB
> NFS_Unstable: 0 kB
> Bounce: 0 kB
> WritebackTmp: 0 kB
> CommitLimit: 1967896 kB
> Committed_AS: 3498576 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 361304 kB
> VmallocChunk: 34359300731 kB
> HardwareCorrupted: 0 kB
> AnonHugePages: 157696 kB
> HugePages_Total: 0
> HugePages_Free: 0
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
> DirectMap4k: 18476 kB
> DirectMap2M: 4059136 kB
>
> And I can't kill it. I heared that it's not good idea, but just for lulz)
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 19 Nov 2013 19:40:40 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=65201
> 
>             Bug ID: 65201
>            Summary: kswapd0 randomly high cpu load
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 3.12
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: akpm@linux-foundation.org
>           Reporter: nleo@nm.ru
>         Regression: No
> 
> kswapd0 randomly load one core of CPU by 100%
> 
> Linux localhost 3.12.0-1-ARCH #1 SMP PREEMPT Wed Nov 6 09:06:27 CET 2013
> x86_64
> GNU/Linux
> 
> No swap enabled
> 
> Befor on same laptop was installed Ubuntu 12.04 and kernel 3.2 32-bit pae,
> and
> there is no such problem.
> 
> [root@localhost ~]# free -mh
>              total       used       free     shared    buffers     cached
> Mem:          3.8G       2.4G       1.3G         0B       150M       508M
> -/+ buffers/cache:       1.8G       2.0G
> Swap:           0B         0B         0B

hm, I wonder what kswapd is up to.

Could you please make it happen again and then

dmesg -n 7
dmesg -c
echo m > /proc/sysrq-trigger
echo t > /proc/sysrq-trigger
dmesg -s 1000000 > foo

then send us foo?

> 
> [root@localhost ~]# cat /proc/meminfo
> MemTotal:        3935792 kB
> MemFree:         1381360 kB
> Buffers:          154216 kB
> Cached:           533096 kB
> SwapCached:            0 kB
> Active:          1958896 kB
> Inactive:         438004 kB
> Active(anon):    1740916 kB
> Inactive(anon):   136292 kB
> Active(file):     217980 kB
> Inactive(file):   301712 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:              2064 kB
> Writeback:             0 kB
> AnonPages:       1709628 kB
> Mapped:           196696 kB
> Shmem:            167620 kB
> Slab:              81516 kB
> SReclaimable:      61312 kB
> SUnreclaim:        20204 kB
> KernelStack:        1696 kB
> PageTables:        13088 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     1967896 kB
> Committed_AS:    3498576 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      361304 kB
> VmallocChunk:   34359300731 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:    157696 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:       18476 kB
> DirectMap2M:     4059136 kB
> 
> And I can't kill it. I heared that it's not good idea, but just for lulz)
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

Revision history for this message

In Linux Kernel Bug Tracker #65201, mihail.zenkov (mihail.zenkov-linux-kernel-bugs) wrote on 2015-04-21:

#156

Created attachment 174671
kmsg dump

Sometimes I have same problem. I don't have swap. I have kernel 3.19.0 (i686) compiled without CONFIG_SWAP.

Revision history for this message

In Linux Kernel Bug Tracker #65201, sakhnik (sakhnik-linux-kernel-bugs) wrote on 2015-04-30:

#157

My Acer C720 too suffers occasionally. Turning swap on/off doesn't help. Dropping caches *does* help:

# echo 3 > /proc/sys/vm/drop_caches # 1 isn't enough

Next my guess would be to try to deactivate zswap.

Revision history for this message

In Linux Kernel Bug Tracker #65201, sakhnik (sakhnik-linux-kernel-bugs) wrote on 2015-05-03:

#158

Zswap isn't to blame, dropping caches may help or may not. There's the output of `sudo perf top`:

  26,24% [kernel] [k] _raw_spin_lock
  14,72% [kernel] [k] _raw_spin_unlock
   6,62% [kernel] [k] super_cache_count
   4,97% [kernel] [k] shrink_slab.part.12
   4,92% [kernel] [k] list_lru_count_one
   2,15% [i2c_designware_core] [k] 0x0000000000000099
   1,86% [kernel] [k] shrink_lruvec
   1,74% [kernel] [k] mem_cgroup_iter
   1,61% [kernel] [k] native_read_tsc
   1,55% [kernel] [k] delay_tsc
   1,52% [kernel] [k] kswapd%

Revision history for this message

In Linux Kernel Bug Tracker #65201, ponymarzanna (ponymarzanna-linux-kernel-bugs) wrote on 2015-11-09:

#159

(In reply to Anatoli Sakhnik from comment #4)
> My Acer C720 too suffers occasionally. Turning swap on/off doesn't help.

I have the same hardware. After system upgrade (current running kernel version 4.2.0) I get high CPU usage after "heavy" web site opens. If suggested workaround doesn't help (dropping caches), I just quit web browser and everything returns back to normal.

Revision history for this message

In Linux Kernel Bug Tracker #65201, samkostka (samkostka-linux-kernel-bugs) wrote on 2015-11-10:

#160

Same here, also on an Acer C720 running arch. kswapd0 takes up a whole core whenever swap is being used. I run the Arch kernel, with a small patch to the chromos_laptop driver to enable my trackpad.

The weird thing is memory and swap both aren't that full. Memory is at 50% utilization, and swap is only at 8%, according to xfce4-taskmanager. It seems like Google Docs is the worst offender for triggering this issue.

Revision history for this message

In Linux Kernel Bug Tracker #65201, mvanross (mvanross-linux-kernel-bugs) wrote on 2015-11-16:

#161

I had this bug, and for me it turned out to be my /tmp directory that
is a tmpfs (to gain speed and save my ssd).

df /tmp
gave
tmpfs 3880480 2449036 1431444 95% /tmp

After removing junk from /tmp/ the system returned to normal.

Also in my case I had no swap, and sufficient free memory.

Would be interested to know if this works for you.

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-11-20:

#1

CurrentDmesg.txt Edit (35.3 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (2.6 KiB, text/plain; charset="utf-8")
JournalErrors.txt Edit (183.2 KiB, text/plain; charset="utf-8")
Lspci.txt Edit (3.0 KiB, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (799 bytes, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (1.7 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (2.3 KiB, text/plain; charset="utf-8")
UdevDb.txt Edit (92.1 KiB, text/plain; charset="utf-8")
WifiSyslog.txt Edit (61.1 KiB, text/plain; charset="utf-8")

Revision history for this message

Brad Figg (brad-figg) wrote on 2015-11-20: Status changed to Confirmed

#2

This change was made by a bot.

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

mecat (habdankm) wrote on 2015-11-22:

#3

same issue here:
root@orangepi:/var/log# uname -a
Linux orangepi 3.4.39 #2 SMP PREEMPT Mon Oct 12 12:03:03 CEST 2015 armv7l armv7l armv7l GNU/Linux
root@orangepi:/var/log# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.10
DISTRIB_CODENAME=wily
DISTRIB_DESCRIPTION="Ubuntu 15.10"
also
echo 1 > /proc/sys/vm/drop_caches
temporary solve issue

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-11-23:

#4

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.4 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc2+cod1-wily/

Changed in linux (Ubuntu):
importance:	Undecided → Medium
status:	Confirmed → Incomplete
tags:	added: kernel-da-key

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-11-24:

#5

This was a clean build, so I don't have any information about previous versions unfortunately. (The previous server, which didn't have this issue, was different AWS hardware and the previous Ubuntu version.)

I've tested with the latest mainline kernel and this is still occurring.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed
tags:	added: kernel-bug-exists-upstream

Joseph Salisbury (jsalisbury) on 2015-11-24

Changed in linux (Ubuntu):
status:	Confirmed → Triaged

Revision history for this message

Sean Groarke (sgroarke) wrote on 2015-11-26:

#6

Pretty much same description here. Started when I upgraded Amazon instance to 15.10.

Causing a lot of disruption - available to test also if it helps move us forward.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-11-30:

#7

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

4.0 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-vivid/
4.1 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.1-wily/
4.2 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2-wily/

You don't have to test every kernel, just up until the kernel that first has this bug. We can then narrow down further by testing some release candidates.

Thanks in advance!

Changed in linux (Ubuntu):
importance:	Medium → High

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-11-30:

#8

Okay, I cloned my server and tried kernel versions. The latest version which does _not_ exhibit the issue is 3.12.51. The first which does is 3.13-rc1.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-01:

#9

Thanks for testing, Sam. Could you also test the 3.12 final version, since 3.13-rc1 is the next linear version after 3.12 final. The kernel can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.12-trusty/

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-01:

#10

3.12 final doesn't exhibit the issue either.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-10:

#11

I started a kernel bisect between v3.12 final and v3.13-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
5cbb3d216e2041700231bcfc383ee5f8b7fc8b74

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-10:

#12

No bug on that version.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-10:

#13

I built the next test kernel, up to the following commit:
e1f56c89b040134add93f686931cc266541d239a

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-10:

#14

No bug on that version.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-11:

#15

I built the next test kernel, up to the following commit:
9073e1a804c3096eda84ee7cbf11d1f174236c75

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-11:

#16

Bug is present in this version.

Joseph Salisbury (jsalisbury) on 2015-12-11

Changed in linux (Ubuntu):
assignee:	nobody → Joseph Salisbury (jsalisbury)
status:	Triaged → In Progress

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-11:

#17

I built the next test kernel, up to the following commit:
ab0169bb5cc4a5c86756dde662087f9d12302eb0

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-13:

#18

No bug in that version.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-14:

#19

I built the next test kernel, up to the following commit:
f080480488028bcc25357f85e8ae54ccc3bb7173

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-15:

#20

Bug is present in this version.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-16:

#21

I built the next test kernel, up to the following commit:
b746f9c7941f227ad582b4f0bc981f3adcbc46b2

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-16:

#22

No bug in that version.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-16:

#23

I built the next test kernel, up to the following commit:
72c1253574a1854b0b6f196e24cd0dd08c1ad9b9

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-16:

#24

kswapd0-kernel-oops.txt Edit (32.4 KiB, text/plain)

Download full text (3.3 KiB)

It's crashing on boot with this version. It's related to paging, so it might be relevant to the issue, so I've attached the full dmesg and here's the actual crash:

[ 3.716345] BUG: unable to handle kernel paging request at 000060ffc0002370
[ 3.720056] IP: [<ffffffff811a6ae1>] mem_cgroup_move_account+0xd1/0x250
[ 3.720056] PGD 0
[ 3.720056] Oops: 0000 [#1] SMP
[ 3.720056] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_recent xt_conntrack nf_conntrack iptable_filter ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy pata_acpi
[ 3.720056] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0-031200rc2-generic #201512161751
[ 3.720056] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[ 3.720056] Workqueue: events css_killed_work_fn
[ 3.720056] task: ffff88003da946b0 ti: ffff88003daac000 task.ti: ffff88003daac000
[ 3.720056] RIP: 0010:[<ffffffff811a6ae1>] [<ffffffff811a6ae1>] mem_cgroup_move_account+0xd1/0x250
[ 3.720056] RSP: 0000:ffff88003daadcc8 EFLAGS: 00010046
[ 3.720056] RAX: 0000000000000246 RBX: ffff88003d803a60 RCX: 000000000000053e
[ 3.720056] RDX: 000060ffc0002358 RSI: 0000000000000001 RDI: ffff88003c4e822c
[ 3.720056] RBP: ffff88003daadd20 R08: ffff88003cc55000 R09: 0000000000000004
[ 3.720056] R10: ffff88003c4e8000 R11: 0000000000000001 R12: 0000000000000000
[ 3.720056] R13: ffffea0000e0e980 R14: ffff88003c4e8000 R15: 0000000000000001
[ 3.720056] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 3.720056] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.720056] CR2: 000060ffc0002370 CR3: 0000000036754000 CR4: 00000000001406f0
[ 3.720056] Stack:
[ 3.720056] ffffffff811a7bc8 ffff88003fffb780 ffffea0000e0e980 ffff88003cc55000
[ 3.720056] ffff88003c4e8000 ffff88003c4e822c ffff880036c1da00 ffffea0000e0e980
[ 3.720056] ffff88003fffbcc0 ffff88003d803a60 ffffea0000e0e9a0 ffff88003daadda8
[ 3.720056] Call Trace:
[ 3.720056] [<ffffffff811a7bc8>] ? mem_cgroup_page_lruvec+0x28/0x90
[ 3.720056] [<ffffffff811a8427>] mem_cgroup_reparent_charges+0x257/0x460
[ 3.720056] [<ffffffff811a87df>] mem_cgroup_css_offline+0xaf/0x220
[ 3.720056] [<ffffffff810de897>] offline_css+0x27/0x50
[ 3.720056] [<ffffffff810e199d>] css_killed_work_fn+0x2d/0xa0
[ 3.720056] [<ffffffff81081032>] process_one_work+0x182/0x450
[ 3.720056] [<ffffffff81081dc1>] worker_thread+0x121/0x410
[ 3.720056] [<ffffffff81081ca0>] ? rescuer_thread+0x3d0/0x3d0
[ 3.720056] [<ffffffff81088ba0>] kthread+0xc0/0xd0
[ 3.720056] [<ffffffff81088ae0>] ? kthread_create_on_node+0x120/0x120
[ 3.720056] [<ffffffff816ff4fc>] ret_from_fork+0x7c/0xb0
[ 3.720056] [<ffffffff81088ae0>] ? kthread_create_on_node+0x120/0x120
[ 3.720056] Code: d6 00 55 00 4d 85 e4 4c 8b 55 c8 4c 8b 45 c0 0f 85 a5 00 00 00 41 8b 55 18 85 d2 0f 88 99 00 00 00 49 8b 96 30 02 00 00 45 89 fb <4c> 39 5a 18 0f 8c c2 00 00 00 44 89 f9 f7 d9 89 ce 65 48 01 72
[ 3.720056] RIP [<ffffffff811a6ae1>] mem_cg...

It's crashing on boot with this version. It's related to paging, so it might be relevant to the issue, so I've attached the full dmesg and here's the actual crash:

[    3.716345] BUG: unable to handle kernel paging request at 000060ffc0002370
[    3.720056] IP: [<ffffffff811a6ae1>] mem_cgroup_move_account+0xd1/0x250
[    3.720056] PGD 0 
[    3.720056] Oops: 0000 [#1] SMP 
[    3.720056] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_recent xt_conntrack nf_conntrack iptable_filter ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy pata_acpi
[    3.720056] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0-031200rc2-generic #201512161751
[    3.720056] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[    3.720056] Workqueue: events css_killed_work_fn
[    3.720056] task: ffff88003da946b0 ti: ffff88003daac000 task.ti: ffff88003daac000
[    3.720056] RIP: 0010:[<ffffffff811a6ae1>]  [<ffffffff811a6ae1>] mem_cgroup_move_account+0xd1/0x250
[    3.720056] RSP: 0000:ffff88003daadcc8  EFLAGS: 00010046
[    3.720056] RAX: 0000000000000246 RBX: ffff88003d803a60 RCX: 000000000000053e
[    3.720056] RDX: 000060ffc0002358 RSI: 0000000000000001 RDI: ffff88003c4e822c
[    3.720056] RBP: ffff88003daadd20 R08: ffff88003cc55000 R09: 0000000000000004
[    3.720056] R10: ffff88003c4e8000 R11: 0000000000000001 R12: 0000000000000000
[    3.720056] R13: ffffea0000e0e980 R14: ffff88003c4e8000 R15: 0000000000000001
[    3.720056] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[    3.720056] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.720056] CR2: 000060ffc0002370 CR3: 0000000036754000 CR4: 00000000001406f0
[    3.720056] Stack:
[    3.720056]  ffffffff811a7bc8 ffff88003fffb780 ffffea0000e0e980 ffff88003cc55000
[    3.720056]  ffff88003c4e8000 ffff88003c4e822c ffff880036c1da00 ffffea0000e0e980
[    3.720056]  ffff88003fffbcc0 ffff88003d803a60 ffffea0000e0e9a0 ffff88003daadda8
[    3.720056] Call Trace:
[    3.720056]  [<ffffffff811a7bc8>] ? mem_cgroup_page_lruvec+0x28/0x90
[    3.720056]  [<ffffffff811a8427>] mem_cgroup_reparent_charges+0x257/0x460
[    3.720056]  [<ffffffff811a87df>] mem_cgroup_css_offline+0xaf/0x220
[    3.720056]  [<ffffffff810de897>] offline_css+0x27/0x50
[    3.720056]  [<ffffffff810e199d>] css_killed_work_fn+0x2d/0xa0
[    3.720056]  [<ffffffff81081032>] process_one_work+0x182/0x450
[    3.720056]  [<ffffffff81081dc1>] worker_thread+0x121/0x410
[    3.720056]  [<ffffffff81081ca0>] ? rescuer_thread+0x3d0/0x3d0
[    3.720056]  [<ffffffff81088ba0>] kthread+0xc0/0xd0
[    3.720056]  [<ffffffff81088ae0>] ? kthread_create_on_node+0x120/0x120
[    3.720056]  [<ffffffff816ff4fc>] ret_from_fork+0x7c/0xb0
[    3.720056]  [<ffffffff81088ae0>] ? kthread_create_on_node+0x120/0x120
[    3.720056] Code: d6 00 55 00 4d 85 e4 4c 8b 55 c8 4c 8b 45 c0 0f 85 a5 00 00 00 41 8b 55 18 85 d2 0f 88 99 00 00 00 49 8b 96 30 02 00 00 45 89 fb <4c> 39 5a 18 0f 8c c2 00 00 00 44 89 f9 f7 d9 89 ce 65 48 01 72 
[    3.720056] RIP  [<ffffffff811a6ae1>] mem_cgroup_move_account+0xd1/0x250
[    3.720056]  RSP <ffff88003daadcc8>
[    3.720056] CR2: 000060ffc0002370
[    3.720056] ---[ end trace 9ea086b6da9e6208 ]---

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-17:

#25

Thanks for testing. I skipped that commit in the bisect, in case it's not related to the bug.

I built the next test kernel, up to the following commit:
cbbc58d4fdfab1a39a6ac1b41fcb17885952157a

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-17:

#26

Same crash on that version.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2015-12-18:

#27

I built the next test kernel, up to the following commit:
3b7834743f9492e3509930feb4ca47135905e640

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2015-12-20:

#28

That version has also crashed.

Revision history for this message

mm (mtl-0) wrote on 2015-12-30:

#29

This bug is also affecting me on 2 (ident) Xubuntu 15.10 systems:

uname -a: ### 4.2.0-22-generic #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.10
DISTRIB_CODENAME=wily
DISTRIB_DESCRIPTION="Ubuntu 15.10"

also
echo 1 or echo 3 > /proc/sys/vm/drop_caches
temporary solves the issue

Revision history for this message

mm (mtl-0) wrote on 2015-12-30:

#30

Also see here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1476211

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2016-01-04:

#31

I built the next test kernel, up to the following commit:
d7876f1be40a16223a44355740de625849504eb5

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-04:

#32

Crashed again.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2016-01-06:

#33

I built the next test kernel, up to the following commit:
732e563373ffc57d38a8a3b6d55f2de865182117

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-06:

#34

Crashed again.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2016-01-06:

#35

I built the next test kernel, up to the following commit:
56aba608257b451f663d25313d5ecae134d5557f

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-06:

#36

Crashed again.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2016-01-06:

#37

I built the next test kernel, up to the following commit:
59ab5a8f4445699e238c4c46b3da63bb9dc02897

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-07:

#38

Crashed again.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2016-01-07:

#39

I built the next test kernel, up to the following commit:
98fda169290b3b28c0f2db2b8f02290c13da50ef

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-07:

#40

Crashed again.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2016-01-07:

#41

I built the next test kernel, up to the following commit:
519192aaae38e24d6b32d3d55d791fe294981185

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1518457

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-07:

#42

Crashed again.

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-01-07:

#43

Sorry for commenting in the middle of your ongoing work, but I struggle matching your findings to my own.

Just like Sam and Sean reported, I see the bug on EC2 t2.micro running 15.10. I can reproduce it on t2.small as well, but it takes a little longer as I have to exaust the free memory. I've installed several instances with different Ubuntu versions keeping everything the same. I only observe this bug on Ubuntu 15.10

All versions running the latest security updates:
14.04 (3.13.0-74): OK
15:04 (3.19.0-43): OK
15:10 (4.2.0-22): Not OK

If I understand the comments correctly, the hypothesis is that the bug was introduced between 3.12 final and 3.13-rc1. To me it seems to have been introduced between 3.19.0-43 and 4.2.0-22.

Is that worth double checking?

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-07:

#44

I've found that how easy it is to reproduce varies somewhat - some versions it's triggered at almost any file manipulation, others I've had to copy a couple of large files around at the same time or otherwise increase disk access while moving large files - but I can reproduce it absolutely consistently in the versions listed above. I've noticed that in the affected versions I get large kswapd0 CPU usage spikes while copying files around even if it doesn't always get stuck at 100% usage indefinitely (though I can always get it to do that with sufficient provocation), while unaffected versions kswapd0 hardly shows up in CPU usage no matter what I do.

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-01-08:

#45

Interesting. I think for now, I should assume that the problem I see is not the same bug. Although the symptoms are very similar with kswapd0 spinning at 100%, swap space not being used, echo 1 > /proc/sys/vm/drop_caches resolving the problem temporarily and the issue first discovered on 15.10 on EC2.

Revision history for this message

Sergii Koshel (s-s-koshel) wrote on 2016-01-08:

#46

Btw, it is very easy to reproduce with t2.nano type of AWS instance.

Linux ip-10-0-2-68 4.2.0-22-generic #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-01-10:

#47

15.04-kswapd0.log Edit (4.2 KiB, text/plain)

Could you share the exact commands to reproduce? I tried dd'ing to run out of memory on r2.micro (1GB memory). Please see the attached command logs. Is this comparable with your results?

What I find interesting is that on 15.04, it runs out of free memory and starts waiting for I/O (no noteworthy use of kswapd0), while on 15.10 it starts using kswapd0 up to 100%, even a while after the dd comand has completed.

Could you compare this with your own results?

Sergii/Sean: do you see the bug on any older kernel versions other than those who come with 15.10?

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-01-10:

#48

15.10-kswapd0.log Edit (8.5 KiB, text/plain)

Revision history for this message

Simon (sam-ua) wrote on 2016-01-10:

#49

Issue on AWS t2.micro
root@ip-127-0-0-1:~# uname -a
Linux ip-127-0-0-1 4.2.0-23-generic #28-Ubuntu SMP Sun Dec 27 17:47:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

root@ip-127-0-0-1:~# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.10
DISTRIB_CODENAME=wily
DISTRIB_DESCRIPTION="Ubuntu 15.10"

Revision history for this message

mm (mtl-0) wrote on 2016-01-10:

#50

Download full text (43.5 KiB)

I can also confirm the issue on Amazon AWS t2.nano
Linux ip-172-31-5-83 4.2.0-22-generic #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

after DDing a 512Mb file:

top - 18:36:00 up 9 min, 1 user, load average: 1.03, 0.64, 0.30
Tasks: 74 total, 3 running, 71 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 27.0 sy, 0.2 ni, 41.3 id, 9.8 wa, 0.0 hi, 0.0 si, 21.2 st
KiB Mem: 498852 total, 86092 used, 412760 free, 4852 buffers
KiB Swap: 0 total, 0 used, 0 free. 15444 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
   32 root 20 0 0 0 0 R 98.4 0.0 1:59.16 kswapd0
    1 root 20 0 37432 2552 1044 S 0.0 0.5 0:02.77 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
    3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
    5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
    6 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kworker/u30:0
    7 root 20 0 0 0 0 S 0.0 0.0 0:00.03 rcu_sched
    8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
    9 root 20 0 0 0 0 R 0.0 0.0 0:00.02 rcuos/0
   10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/0
   11 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
   12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
   13 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper
   14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs
   15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns
   16 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 perf
   17 root 20 0 0 0 0 S 0.0 0.0 0:00.01 xenwatch

root@ip-172-31-5-83:/home/ubuntu# ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.4 0.5 37432 2552 ? Ss 18:26 0:02 /sbin/init
root 2 0.0 0.0 0 0 ? S 18:26 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 18:26 0:00 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< 18:26 0:00 [kworker/0:0H]
root 6 0.0 0.0 0 0 ? S 18:26 0:00 [kworker/u30:0]
root 7 0.0 0.0 0 0 ? S 18:26 0:00 [rcu_sched]
root 8 0.0 0.0 0 0 ? S 18:26 0:00 [rcu_bh]
root 9 0.0 0.0 0 0 ? S 18:26 0:00 [rcuos/0]
root 10 0.0 0.0 0 0 ? S 18:26 0:00 [rcuob/0]
root 11 0.0 0.0 0 0 ? S 18:26 0:00 [migration/0]
root 12 0.0 0.0 0 0 ? S 18:26 0:00 [watchdog/0]
root 13 0.0 0.0 0 0 ? S< 18:26 0:00 [khelper]
root 14 0.0 0.0 0 0 ? S 18:26 0:00 [kdevtmpfs]
root 15 0.0 0.0 0 0 ? S< 18:26 0:00 [netns]
root 16 0.0 0.0 0 0 ? S< 18:26 0:00 [perf]
root ...

I can also confirm the issue on Amazon AWS t2.nano
Linux ip-172-31-5-83 4.2.0-22-generic #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

after DDing a 512Mb file:

top - 18:36:00 up 9 min,  1 user,  load average: 1.03, 0.64, 0.30
Tasks:  74 total,   3 running,  71 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.5 us, 27.0 sy,  0.2 ni, 41.3 id,  9.8 wa,  0.0 hi,  0.0 si, 21.2 st
KiB Mem:    498852 total,    86092 used,   412760 free,     4852 buffers
KiB Swap:        0 total,        0 used,        0 free.    15444 cached Mem

PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
   32 root      20   0       0      0      0 R 98.4  0.0   1:59.16 kswapd0
    1 root      20   0   37432   2552   1044 S  0.0  0.5   0:02.77 systemd
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0   0:00.00 ksoftirqd/0
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H
    6 root      20   0       0      0      0 S  0.0  0.0   0:00.02 kworker/u30:0
    7 root      20   0       0      0      0 S  0.0  0.0   0:00.03 rcu_sched
    8 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcu_bh
    9 root      20   0       0      0      0 R  0.0  0.0   0:00.02 rcuos/0
   10 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcuob/0
   11 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 migration/0
   12 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 watchdog/0
   13 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 khelper
   14 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kdevtmpfs
   15 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 netns
   16 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 perf
   17 root      20   0       0      0      0 S  0.0  0.0   0:00.01 xenwatch

root@ip-172-31-5-83:/home/ubuntu# ps -aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.4  0.5  37432  2552 ?        Ss   18:26   0:02 /sbin/init
root         2  0.0  0.0      0     0 ?        S    18:26   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    18:26   0:00 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   18:26   0:00 [kworker/0:0H]
root         6  0.0  0.0      0     0 ?        S    18:26   0:00 [kworker/u30:0]
root         7  0.0  0.0      0     0 ?        S    18:26   0:00 [rcu_sched]
root         8  0.0  0.0      0     0 ?        S    18:26   0:00 [rcu_bh]
root         9  0.0  0.0      0     0 ?        S    18:26   0:00 [rcuos/0]
root        10  0.0  0.0      0     0 ?        S    18:26   0:00 [rcuob/0]
root        11  0.0  0.0      0     0 ?        S    18:26   0:00 [migration/0]
root        12  0.0  0.0      0     0 ?        S    18:26   0:00 [watchdog/0]
root        13  0.0  0.0      0     0 ?        S<   18:26   0:00 [khelper]
root        14  0.0  0.0      0     0 ?        S    18:26   0:00 [kdevtmpfs]
root        15  0.0  0.0      0     0 ?        S<   18:26   0:00 [netns]
root        16  0.0  0.0      0     0 ?        S<   18:26   0:00 [perf]
root        17  0.0  0.0      0     0 ?        S    18:26   0:00 [xenwatch]
root        18  0.0  0.0      0     0 ?        S    18:26   0:00 [xenbus]
root        20  0.0  0.0      0     0 ?        S    18:26   0:00 [khungtaskd]
root        21  0.0  0.0      0     0 ?        S<   18:26   0:00 [writeback]
root        22  0.0  0.0      0     0 ?        SN   18:26   0:00 [ksmd]
root        23  0.0  0.0      0     0 ?        S<   18:26   0:00 [crypto]
root        24  0.0  0.0      0     0 ?        S<   18:26   0:00 [kintegrityd]
root        25  0.0  0.0      0     0 ?        S<   18:26   0:00 [bioset]
root        26  0.0  0.0      0     0 ?        S<   18:26   0:00 [kblockd]
root        27  0.0  0.0      0     0 ?        S<   18:26   0:00 [ata_sff]
root        28  0.0  0.0      0     0 ?        S<   18:26   0:00 [md]
root        29  0.0  0.0      0     0 ?        S<   18:26   0:00 [devfreq_wq]
root        30  0.0  0.0      0     0 ?        S    18:26   0:00 [kworker/u30:1]
root        32 23.3  0.0      0     0 ?        R    18:26   2:11 [kswapd0]
root        33  0.0  0.0      0     0 ?        S    18:26   0:00 [fsnotify_mark]
root        34  0.0  0.0      0     0 ?        S    18:26   0:00 [ecryptfs-kthrea]
root        45  0.0  0.0      0     0 ?        S<   18:26   0:00 [kthrotld]
root        46  0.0  0.0      0     0 ?        S<   18:26   0:00 [acpi_thermal_pm]
root        47  0.0  0.0      0     0 ?        S    18:26   0:00 [scsi_eh_0]
root        48  0.0  0.0      0     0 ?        S<   18:26   0:00 [scsi_tmf_0]
root        49  0.0  0.0      0     0 ?        S    18:26   0:00 [scsi_eh_1]
root        50  0.0  0.0      0     0 ?        S<   18:26   0:00 [scsi_tmf_1]
root        55  0.0  0.0      0     0 ?        S<   18:26   0:00 [ipv6_addrconf]
root        74  0.0  0.0      0     0 ?        S<   18:26   0:00 [deferwq]
root        75  0.0  0.0      0     0 ?        S<   18:26   0:00 [charger_manager]
root       129  0.0  0.0      0     0 ?        S<   18:26   0:00 [kpsmoused]
root       140  0.0  0.0      0     0 ?        S    18:26   0:00 [kworker/0:2]
root       246  0.0  0.0      0     0 ?        S    18:26   0:00 [jbd2/xvda1-8]
root       247  0.0  0.0      0     0 ?        S<   18:26   0:00 [ext4-rsv-conver]
root       301  0.0  0.0      0     0 ?        S    18:27   0:00 [kauditd]
root       311  0.0  0.8  39756  4248 ?        Ss   18:27   0:00 /lib/systemd/systemd-journald
root       319  0.0  0.0      0     0 ?        S    18:27   0:00 [kworker/0:10]
root       338  0.0  0.5  43284  2528 ?        Ss   18:27   0:00 /lib/systemd/systemd-udevd
systemd+   388  0.0  0.1 100300   636 ?        Ssl  18:27   0:00 /lib/systemd/systemd-timesyncd
root       505  0.0  0.0      0     0 ?        S<   18:27   0:00 [kworker/0:1H]
root       574  0.0  1.3  23460  6888 ?        Ss   18:27   0:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
message+   653  0.0  0.1  42856   944 ?        Ss   18:27   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root       668  0.0  0.1  28540   756 ?        Ss   18:27   0:00 /lib/systemd/systemd-logind
root       669  0.0  0.1  33612   712 ?        Ss   18:27   0:00 /sbin/cgmanager -m name=systemd
daemon     670  0.0  0.0  19180   420 ?        Ss   18:27   0:00 /usr/sbin/atd -f
syslog     679  0.0  0.2 260604  1224 ?        Ssl  18:27   0:00 /usr/sbin/rsyslogd -n
root       682  0.0  0.2 280228  1400 ?        Ssl  18:27   0:00 /usr/lib/accountsservice/accounts-daemon
root       683  0.0  0.0  29848   232 ?        Ss   18:27   0:00 /usr/bin/lxcfs /var/lib/lxcfs/
root       684  0.0  0.1  26052   716 ?        Ss   18:27   0:00 /usr/sbin/cron -f
lxc-dns+   809  0.0  0.1  45272   844 ?        S    18:27   0:00 dnsmasq -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --listen-address 10.0.3.
root       826  0.0  0.2 277036  1300 ?        Ssl  18:27   0:00 /usr/lib/policykit-1/polkitd --no-debug
root      1018  0.0  0.2  69932  1236 ?        Ss   18:28   0:00 /usr/sbin/sshd -D
root      1026  0.0  0.1  14576   532 tty1     Ss+  18:28   0:00 /sbin/agetty --noclear tty1 linux
root      1027  0.0  0.0  12824   416 ttyS0    Ss+  18:28   0:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
root      1119  0.0  0.2  99776  1452 ?        Ss   18:28   0:00 sshd: ubuntu [priv]
ubuntu    1121  0.0  0.3  45168  1680 ?        Ss   18:28   0:00 /lib/systemd/systemd --user
ubuntu    1122  0.0  0.3  58792  1604 ?        S    18:28   0:00 (sd-pam)
ubuntu    1190  0.0  0.2  99776  1464 ?        S    18:29   0:00 sshd: ubuntu@pts/0
ubuntu    1191  0.0  0.5  21356  2620 pts/0    Ss   18:29   0:00 -bash
root      1216  0.0  0.3  55524  1736 pts/0    S    18:30   0:00 sudo su
root      1217  0.0  0.2  48812  1240 pts/0    S    18:30   0:00 su
root      1218  0.0  0.5  19836  2632 pts/0    S    18:30   0:00 bash
root      1238  0.0  0.0      0     0 ?        S    18:32   0:00 [kworker/u30:2]
root      1255  0.0  0.5  17200  2624 pts/0    R+   18:36   0:00 ps -aux

root@ip-172-31-5-83:/home/ubuntu# dmesg
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.2.0-22-generic (buildd@lcy01-22) (gcc version 5.2.1 20151010 (Ubuntu 5.2.1-22ubuntu2) ) #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC 2015 (Ubuntu 4.2.0-22.27-generic 4.2.6)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.2.0-22-generic root=UUID=8b439d74-d58f-493e-ae94-3074e2fcfe1d ro console=tty1 console=ttyS0 net.ifnames=0
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 0x340 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001fffffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.4 present.
[    0.000000] DMI: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[    0.000000] Hypervisor detected: Xen
[    0.000000] Xen version 4.2.
[    0.000000] Xen Platform PCI: I/O protocol version 1
[    0.000000] Netfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated NICs.
[    0.000000] Blkfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated disks.
               You might have to change the root device
               from /dev/hd[a-d] to /dev/xvd[a-d]
               in your root= kernel command line option
[    0.000000] HVMOP_pagetable_dying not supported
[    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000000] e820: last_pfn = 0x20000 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF write-combining
[    0.000000]   C0000-FFFFF write-back
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 0000F0000000 mask 3FFFF8000000 uncachable
[    0.000000]   1 base 0000F8000000 mask 3FFFFC000000 uncachable
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
[    0.000000] found SMP MP-table at [mem 0x000fbba0-0x000fbbaf] mapped at [ffff8800000fbba0]
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Base memory trampoline at [ffff880000098000] 98000 size 24576
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000]  [mem 0x00000000-0x000fffff] page 4k
[    0.000000] BRK [0x01ff1000, 0x01ff1fff] PGTABLE
[    0.000000] BRK [0x01ff2000, 0x01ff2fff] PGTABLE
[    0.000000] BRK [0x01ff3000, 0x01ff3fff] PGTABLE
[    0.000000] init_memory_mapping: [mem 0x1fe00000-0x1fffffff]
[    0.000000]  [mem 0x1fe00000-0x1fffffff] page 2M
[    0.000000] init_memory_mapping: [mem 0x00100000-0x1fdfffff]
[    0.000000]  [mem 0x00100000-0x001fffff] page 4k
[    0.000000]  [mem 0x00200000-0x1fdfffff] page 2M
[    0.000000] RAMDISK: [mem 0x1f20e000-0x1f7adfff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000EA020 000024 (v02 Xen   )
[    0.000000] ACPI: XSDT 0x00000000FC00F5A0 000054 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: FACP 0x00000000FC00F260 0000F4 (v04 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: DSDT 0x00000000FC0035E0 00BBF6 (v02 Xen    HVM      00000000 INTL 20090123)
[    0.000000] ACPI: FACS 0x00000000FC0035A0 000040
[    0.000000] ACPI: FACS 0x00000000FC0035A0 000040
[    0.000000] ACPI: APIC 0x00000000FC00F360 0000D8 (v02 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: HPET 0x00000000FC00F4B0 000038 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: WAET 0x00000000FC00F4F0 000028 (v01 Xen    HVM      00000000 HVML 00000000)
[    0.000000] ACPI: SSDT 0x00000000FC00F520 000031 (v02 Xen    HVM      00000000 INTL 20090123)
[    0.000000] ACPI: SSDT 0x00000000FC00F560 000031 (v02 Xen    HVM      00000000 INTL 20090123)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x1fffb000-0x1fffffff]
[    0.000000]  [ffffea0000000000-ffffea00007fffff] PMD -> [ffff88001ea00000-ffff88001f1fffff] on node 0
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x000000001fffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009dfff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x000000001fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000001fffffff]
[    0.000000] On node 0 totalpages: 130973
[    0.000000]   DMA zone: 64 pages used for memmap
[    0.000000]   DMA zone: 21 pages reserved
[    0.000000]   DMA zone: 3997 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 1984 pages used for memmap
[    0.000000]   DMA32 zone: 126976 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-47
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 low level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] smpboot: Allowing 15 CPUs, 14 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009e000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
[    0.000000] e820: [mem 0x20000000-0xfbffffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on Xen HVM
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:15 nr_node_ids:1
[    0.000000] PERCPU: Embedded 33 pages/cpu @ffff88001e600000 s96728 r8192 d30248 u262144
[    0.000000] pcpu-alloc: s96728 r8192 d30248 u262144 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 --
[    0.000000] xen: PV spinlocks enabled
[    0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes)
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 128904
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.2.0-22-generic root=UUID=8b439d74-d58f-493e-ae94-3074e2fcfe1d ro console=tty1 console=ttyS0 net.ifnames=0
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Calgary: detecting Calgary via BIOS EBDA area
[    0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[    0.000000] Memory: 491268K/523892K available (8148K kernel code, 1237K rwdata, 3800K rodata, 1460K init, 1292K bss, 32624K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=15, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  Build-time adjustment of leaf fanout to 64.
[    0.000000]  RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=15.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=15
[    0.000000] NR_IRQS:16640 nr_irqs:952 16
[    0.000000] xen:events: Using 2-level ABI
[    0.000000] xen:events: Xen HVM callback vector for event delivery is enabled
[    0.000000]  Offload RCU callbacks from all CPUs
[    0.000000]  Offload RCU callbacks from CPUs: 0-14.
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty1] enabled
[    0.000000] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
[    0.000000] console [ttyS0] enabled
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 30580167144 ns
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Detected 2400.036 MHz processor
[    0.008000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4800.07 BogoMIPS (lpj=9600144)
[    0.016004] pid_max: default: 32768 minimum: 301
[    0.020014] ACPI: Core revision 20150619
[    0.033954] ACPI: All ACPI Tables successfully acquired
[    0.040029] Security Framework initialized
[    0.044016] AppArmor: AppArmor initialized
[    0.048002] Yama: becoming mindful.
[    0.052065] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.060098] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.068046] Mount-cache hash table entries: 1024 (order: 1, 8192 bytes)
[    0.072004] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes)
[    0.080171] Initializing cgroup subsys blkio
[    0.088005] Initializing cgroup subsys memory
[    0.092011] Initializing cgroup subsys devices
[    0.096003] Initializing cgroup subsys freezer
[    0.100003] Initializing cgroup subsys net_cls
[    0.104003] Initializing cgroup subsys perf_event
[    0.112004] Initializing cgroup subsys net_prio
[    0.116003] Initializing cgroup subsys hugetlb
[    0.120056] CPU: Physical Processor ID: 0
[    0.124743] mce: CPU supports 2 MCE banks
[    0.128020] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
[    0.136002] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[    0.159504] ftrace: allocating 30910 entries in 121 pages
[    0.188675] x2apic: IRQ remapping doesn't support X2APIC mode
[    0.196003] Switched APIC routing to physical flat.
[    0.202098] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=0 pin2=0
[    0.249577] clocksource: xen: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.256007] Xen: using vcpuop timer interface
[    0.256013] installing Xen timer for CPU 0
[    0.260052] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz (fam: 06, model: 3f, stepping: 02)
[    0.268027] cpu 0 spinlock event irq 53
[    0.271942] Performance Events: unsupported p6 CPU model 63 no PMU driver, software events only.
[    0.276737] x86: Booted up 1 node, 1 CPUs
[    0.280007] smpboot: Total of 1 processors activated (4800.07 BogoMIPS)
[    0.284010] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.288003] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.292349] devtmpfs: initialized
[    0.297159] evm: security.selinux
[    0.300005] evm: security.SMACK64
[    0.303716] evm: security.SMACK64EXEC
[    0.304005] evm: security.SMACK64TRANSMUTE
[    0.308003] evm: security.SMACK64MMAP
[    0.312003] evm: security.ima
[    0.315722] evm: security.capability
[    0.316141] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.320097] pinctrl core: initialized pinctrl subsystem
[    0.324129] RTC time: 18:26:47, date: 01/10/16
[    0.328133] NET: Registered protocol family 16
[    0.332153] cpuidle: using governor ladder
[    0.336014] cpuidle: using governor menu
[    0.340075] ACPI: bus type PCI registered
[    0.344005] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.348389] PCI: Using configuration type 1 for base access
[    0.353036] ACPI: Added _OSI(Module Device)
[    0.356005] ACPI: Added _OSI(Processor Device)
[    0.360003] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.364006] ACPI: Added _OSI(Processor Aggregator Device)
[    0.369023] xen: --> pirq=16 -> irq=9 (gsi=9)
[    0.371647] ACPI: Interpreter enabled
[    0.372008] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20150619/hwxface-580)
[    0.380004] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20150619/hwxface-580)
[    0.387055] ACPI: (supports S0 S3 S4 S5)
[    0.388003] ACPI: Using IOAPIC for interrupt routing
[    0.392031] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.457209] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.460013] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    0.464014] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.468058] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    0.472766] acpiphp: Slot [0] registered
[    0.477601] acpiphp: Slot [3] registered
[    0.480385] acpiphp: Slot [4] registered
[    0.484388] acpiphp: Slot [5] registered
[    0.488372] acpiphp: Slot [6] registered
[    0.492412] acpiphp: Slot [7] registered
[    0.496360] acpiphp: Slot [8] registered
[    0.500496] acpiphp: Slot [9] registered
[    0.504742] acpiphp: Slot [10] registered
[    0.508376] acpiphp: Slot [11] registered
[    0.512377] acpiphp: Slot [12] registered
[    0.516580] acpiphp: Slot [13] registered
[    0.520399] acpiphp: Slot [14] registered
[    0.525385] acpiphp: Slot [15] registered
[    0.528788] acpiphp: Slot [16] registered
[    0.532401] acpiphp: Slot [17] registered
[    0.536405] acpiphp: Slot [18] registered
[    0.540406] acpiphp: Slot [19] registered
[    0.544488] acpiphp: Slot [20] registered
[    0.548519] acpiphp: Slot [21] registered
[    0.552441] acpiphp: Slot [22] registered
[    0.556506] acpiphp: Slot [23] registered
[    0.560403] acpiphp: Slot [24] registered
[    0.564398] acpiphp: Slot [25] registered
[    0.568378] acpiphp: Slot [26] registered
[    0.572522] acpiphp: Slot [27] registered
[    0.576338] acpiphp: Slot [28] registered
[    0.580363] acpiphp: Slot [29] registered
[    0.584471] acpiphp: Slot [30] registered
[    0.588676] acpiphp: Slot [31] registered
[    0.592364] PCI host bridge to bus 0000:00
[    0.596006] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.600004] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.604014] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.608008] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.612007] pci_bus 0000:00: root bus resource [mem 0xf0000000-0xfbffffff window]
[    0.616311] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[    0.618280] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[    0.621316] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
[    0.624548] pci 0000:00:01.1: reg 0x20: [io  0xc100-0xc10f]
[    0.625614] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.628008] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.632006] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.636007] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.641393] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[    0.641675] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
               * this clock source is slow. Consider trying other clock sources
[    0.645558] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[    0.649085] pci 0000:00:02.0: [1013:00b8] type 00 class 0x030000
[    0.649592] pci 0000:00:02.0: reg 0x10: [mem 0xf0000000-0xf1ffffff pref]
[    0.650014] pci 0000:00:02.0: reg 0x14: [mem 0xf3000000-0xf3000fff]
[    0.652462] pci 0000:00:03.0: [5853:0001] type 00 class 0xff8000
[    0.653341] pci 0000:00:03.0: reg 0x10: [io  0xc000-0xc0ff]
[    0.653644] pci 0000:00:03.0: reg 0x14: [mem 0xf2000000-0xf2ffffff pref]
[    0.658618] ACPI: PCI Interrupt Link [LNKA] (IRQs *5 10 11)
[    0.664263] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.671933] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.676297] ACPI: PCI Interrupt Link [LNKD] (IRQs *5 10 11)
[    0.705541] ACPI: Enabled 2 GPEs in block 00 to 0F
[    0.708044] xen:balloon: Initialising balloon driver
[    0.716025] xen_balloon: Initialising balloon driver
[    0.720126] vgaarb: setting as boot device: PCI:0000:00:02.0
[    0.724000] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.724010] vgaarb: loaded
[    0.727870] vgaarb: bridge control possible 0000:00:02.0
[    0.728175] init_memory_mapping: [mem 0x20000000-0x27ffffff]
[    0.732005]  [mem 0x20000000-0x27ffffff] page 2M
[    0.732884]  [ffffea0000800000-ffffea00009fffff] PMD -> [ffff88001da00000-ffff88001dbfffff] on node 0
[    0.732938] SCSI subsystem initialized
[    0.736044] libata version 3.00 loaded.
[    0.736066] ACPI: bus type USB registered
[    0.740028] usbcore: registered new interface driver usbfs
[    0.744011] usbcore: registered new interface driver hub
[    0.748020] usbcore: registered new device driver usb
[    0.752137] PCI: Using ACPI for IRQ routing
[    0.756006] PCI: pci_cache_line_size set to 64 bytes
[    0.756571] e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff]
[    0.756684] NetLabel: Initializing
[    0.760004] NetLabel:  domain hash size = 128
[    0.764004] NetLabel:  protocols = UNLABELED CIPSOv4
[    0.768015] NetLabel:  unlabeled traffic allowed by default
[    0.772090] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.776017] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.782375] hpet0: 3 comparators, 64-bit 62.500000 MHz counter
[    0.787073] clocksource: Switched to clocksource xen
[    0.794102] AppArmor: AppArmor Filesystem Enabled
[    0.798669] pnp: PnP ACPI init
[    0.802157] system 00:00: [mem 0x00000000-0x0009ffff] could not be reserved
[    0.810376] system 00:00: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.810457] system 00:01: [io  0x08a0-0x08a3] has been reserved
[    0.816516] system 00:01: [io  0x0cc0-0x0ccf] has been reserved
[    0.822414] system 00:01: [io  0x04d0-0x04d1] has been reserved
[    0.828243] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.828268] xen: --> pirq=17 -> irq=8 (gsi=8)
[    0.828288] pnp 00:02: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.828307] xen: --> pirq=18 -> irq=12 (gsi=12)
[    0.828321] pnp 00:03: Plug and Play ACPI device, IDs PNP0f13 (active)
[    0.828335] xen: --> pirq=19 -> irq=1 (gsi=1)
[    0.828347] pnp 00:04: Plug and Play ACPI device, IDs PNP0303 PNP030b (active)
[    0.828360] xen: --> pirq=20 -> irq=6 (gsi=6)
[    0.828362] pnp 00:05: [dma 2]
[    0.828374] pnp 00:05: Plug and Play ACPI device, IDs PNP0700 (active)
[    0.828393] xen: --> pirq=21 -> irq=4 (gsi=4)
[    0.828406] pnp 00:06: Plug and Play ACPI device, IDs PNP0501 (active)
[    0.828439] system 00:07: [io  0x10c0-0x1141] has been reserved
[    0.835889] system 00:07: [io  0xb044-0xb047] has been reserved
[    0.841914] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.865915] pnp: PnP ACPI: found 8 devices
[    0.875866] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.884236] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
[    0.884238] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff window]
[    0.884240] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[    0.884241] pci_bus 0000:00: resource 7 [mem 0xf0000000-0xfbffffff window]
[    0.884277] NET: Registered protocol family 2
[    0.889317] TCP established hash table entries: 4096 (order: 3, 32768 bytes)
[    0.896188] TCP bind hash table entries: 4096 (order: 4, 65536 bytes)
[    0.902456] TCP: Hash tables configured (established 4096 bind 4096)
[    0.909178] UDP hash table entries: 256 (order: 1, 8192 bytes)
[    0.915583] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[    0.923462] NET: Registered protocol family 1
[    0.927634] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.933727] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.938804] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.944771] pci 0000:00:02.0: Video device with shadowed ROM
[    0.944836] PCI: CLS 0 bytes, default 64
[    0.944904] Trying to unpack rootfs image as initramfs...
[    1.918341] Freeing initrd memory: 5760K (ffff88001f20e000 - ffff88001f7ae000)
[    1.925681] RAPL PMU detected, API unit is 2^-32 Joules, 3 fixed counters 655360 ms ovfl timer
[    1.933270] hw unit of domain pp0-core 2^-14 Joules
[    1.938299] hw unit of domain package 2^-14 Joules
[    1.943812] hw unit of domain dram 2^-16 Joules
[    1.949483] microcode: CPU0 sig=0x306f2, pf=0x1, revision=0x25
[    1.956101] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[    1.964170] Scanning for low memory corruption every 60 seconds
[    1.971284] futex hash table entries: 4096 (order: 6, 262144 bytes)
[    1.979020] Initialise system trusted keyring
[    1.983900] audit: initializing netlink subsys (disabled)
[    1.989124] audit: type=2000 audit(1452450409.469:1): initialized
[    1.994901] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    2.003740] zpool: loaded
[    2.007969] zbud: loaded
[    2.011418] VFS: Disk quotas dquot_6.6.0
[    2.015678] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    2.022292] fuse init (API version 7.23)
[    2.026324] Key type big_key registered
[    2.030737] Key type asymmetric registered
[    2.035244] Asymmetric key parser 'x509' registered
[    2.041175] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249)
[    2.049578] io scheduler noop registered
[    2.053605] io scheduler deadline registered (default)
[    2.058109] io scheduler cfq registered
[    2.062249] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    2.067820] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    2.075405] intel_idle: does not run on family 6 model 63
[    2.075484] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    2.083291] ACPI: Power Button [PWRF]
[    2.088137] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
[    2.095109] ACPI: Sleep Button [SLPF]
[    2.099913] GHES: HEST is not enabled!
[    2.105670] xen: --> pirq=22 -> irq=28 (gsi=28)
[    2.105779] xen:grant_table: Grant tables using version 1 layout
[    2.113210] Grant table initialized
[    2.117138] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22!
[    2.122866] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    2.161452] 00:06: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    2.171661] Linux agpgart interface v0.103
[    2.178792] brd: module loaded
[    2.184392] loop: module loaded
[    2.194347] ata_piix 0000:00:01.1: version 2.13
[    2.195341] scsi host0: ata_piix
[    2.200409] scsi host1: ata_piix
[    2.205156] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc100 irq 14
[    2.212055] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc108 irq 15
[    2.228305] libphy: Fixed MDIO Bus: probed
[    2.234120] tun: Universal TUN/TAP device driver, 1.6
[    2.239155] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    2.244948] PPP generic driver version 2.4.2
[    2.249616] xen_netfront: Initialising Xen virtual ethernet driver
[    2.279258] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    2.285826] ehci-pci: EHCI PCI platform driver
[    2.290630] ehci-platform: EHCI generic platform driver
[    2.296769] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    2.305654] ohci-pci: OHCI PCI platform driver
[    2.310341] ohci-platform: OHCI generic platform driver
[    2.315400] uhci_hcd: USB Universal Host Controller Interface driver
[    2.321193] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
[    2.332441] serio: i8042 KBD port at 0x60,0x64 irq 1
[    2.338263] serio: i8042 AUX port at 0x60,0x64 irq 12
[    2.343693] mousedev: PS/2 mouse device common for all mice
[    2.350541] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[    2.360099] rtc_cmos 00:02: rtc core: registered rtc_cmos as rtc0
[    2.366321] rtc_cmos 00:02: alarms up to one day, 114 bytes nvram, hpet irqs
[    2.373527] i2c /dev entries driver
[    2.377623] device-mapper: uevent: version 1.0.3
[    2.383453] device-mapper: ioctl: 4.33.0-ioctl (2015-8-18) initialised: dm-devel@redhat.com
[    2.394007] ledtrig-cpu: registered to indicate activity on CPUs
[    2.399950] PCCT header not found.
[    2.404187] NET: Registered protocol family 10
[    2.409646] NET: Registered protocol family 17
[    2.414944] Key type dns_resolver registered
[    2.420927] Loading compiled-in X.509 certificates
[    2.426812] Loaded X.509 cert 'Build time autogenerated kernel key: 5ba1353109d27849ddd6fbd554a5534438beafab'
[    2.436748] registered taskstats version 1
[    2.455938] zswap: loading zswap
[    2.459802] zswap: using zbud pool
[    2.463625] zswap: using lzo compressor
[    2.468765] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled;
[    2.479167] Key type trusted registered
[    2.487682] Key type encrypted registered
[    2.492567] AppArmor: AppArmor sha1 policy hashing enabled
[    2.498339] ima: No TPM chip found, activating TPM-bypass!
[    2.503734] evm: HMAC attrs: 0x1
[    2.507168]  xvda: xvda1
[    2.608104] xenbus_probe_frontend: Device with no driver: device/vfb/0
[    2.614787]   Magic number: 0:5:443
[    2.618877] rtc_cmos 00:02: setting system clock to 2016-01-10 18:26:50 UTC (1452450410)
[    2.627603] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    2.636395] EDD information not available.
[    2.643987] PM: Hibernation image not present or could not be loaded.
[    2.645549] Freeing unused kernel memory: 1460K (ffffffff81d37000 - ffffffff81ea4000)
[    2.654532] Write protecting the kernel read-only data: 12288k
[    2.660287] Freeing unused kernel memory: 32K (ffff8800017f8000 - ffff880001800000)
[    2.668468] Freeing unused kernel memory: 296K (ffff880001bb6000 - ffff880001c00000)
[    2.690364] random: udevadm urandom read with 22 bits of entropy available
[    2.762779] FDC 0 is a S82078B
[    2.831487] AVX2 version of gcm_enc/dec engaged.
[    2.836578] AES CTR mode by8 optimization enabled
[    2.924089] tsc: Refined TSC clocksource calibration: 2399.998 MHz
[    2.932178] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x229835b7123, max_idle_ns: 440795242976 ns
[    3.838651] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input4
[    9.588528] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
[   11.187675] random: nonblocking pool is initialized
[   14.609918] systemd[1]: Failed to insert module 'kdbus': Function not implemented
[   15.242105] systemd[1]: systemd 225 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN)
[   15.259046] systemd[1]: Detected virtualization xen.
[   15.262406] systemd[1]: Detected architecture x86-64.
[   15.270552] systemd[1]: Set hostname to <ubuntu>.
[   15.378663] systemd[1]: Initializing machine ID from random generator.
[   15.382289] systemd[1]: Installed transient /etc/machine-id file.
[   18.620994] systemd[1]: sshd-keygen.service: Cannot add dependency job, ignoring: Unit sshd-keygen.service failed to load: No such file or directory.
[   18.629655] systemd[1]: display-manager.service: Cannot add dependency job, ignoring: Unit display-manager.service failed to load: No such file or directory.
[   18.638849] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[   18.649419] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[   18.660465] systemd[1]: Reached target Encrypted Volumes.
[   18.667202] systemd[1]: Reached target Swap.
[   18.672851] systemd[1]: Reached target Remote File Systems (Pre).
[   18.681031] systemd[1]: Created slice Root Slice.
[   18.687904] systemd[1]: Listening on udev Kernel Socket.
[   18.694810] systemd[1]: Listening on Journal Audit Socket.
[   18.702067] systemd[1]: Listening on /dev/initctl Compatibility Named Pipe.
[   18.713335] systemd[1]: Listening on Journal Socket (/dev/log).
[   18.722692] systemd[1]: Created slice User and Session Slice.
[   18.729575] systemd[1]: Listening on Journal Socket.
[   18.737302] systemd[1]: Listening on udev Control Socket.
[   18.744259] systemd[1]: Reached target User and Group Name Lookups.
[   18.752401] systemd[1]: Created slice System Slice.
[   18.759971] systemd[1]: Starting Setup Virtual Console...
[   18.769014] systemd[1]: Mounting Debug File System...
[   18.777601] systemd[1]: Created slice system-getty.slice.
[   18.784968] systemd[1]: Starting Create list of required static device nodes for the current kernel...
[   18.796732] systemd[1]: Starting Increase datagram queue length...
[   18.807772] systemd[1]: Starting Load Kernel Modules...
[   18.815365] systemd[1]: Mounting POSIX Message Queue File System...
[   18.823655] systemd[1]: Created slice system-serial\x2dgetty.slice.
[   18.856984] systemd[1]: Starting Uncomplicated firewall...
[   18.864735] systemd[1]: Starting Remount Root and Kernel File Systems...
[   18.873785] systemd[1]: Reached target Slices.
[   18.881302] systemd[1]: Mounting Huge Pages File System...
[   18.889300] systemd[1]: Starting udev Coldplug all Devices...
[   18.898284] systemd[1]: Started Setup Virtual Console.
[   18.944194] systemd[1]: Mounted POSIX Message Queue File System.
[   18.952215] systemd[1]: Mounted Debug File System.
[   18.955529] systemd[1]: Mounted Huge Pages File System.
[   18.967368] systemd[1]: Started Increase datagram queue length.
[   18.976241] systemd[1]: Listening on Syslog Socket.
[   18.982396] systemd[1]: Starting Journal Service...
[   19.044175] EXT4-fs (xvda1): re-mounted. Opts: (null)
[   19.048701] systemd[1]: Started Create list of required static device nodes for the current kernel.
[   19.059577] systemd[1]: Started Load Kernel Modules.
[   19.072792] systemd[1]: Started Uncomplicated firewall.
[   19.081762] systemd[1]: Started Remount Root and Kernel File Systems.
[   19.097361] systemd[1]: Starting Load/Save Random Seed...
[   19.111568] systemd[1]: Starting Ensure /etc/mtab is a symlink to /proc/mounts...
[   19.129279] systemd[1]: Starting Apply Kernel Variables...
[   19.140824] systemd[1]: Mounting FUSE Control File System...
[   19.146253] systemd[1]: Starting Create Static Device Nodes in /dev...
[   19.156875] systemd[1]: Mounted FUSE Control File System.
[   19.170785] systemd[1]: Started udev Coldplug all Devices.
[   19.233069] systemd[1]: Started Load/Save Random Seed.
[   19.354467] systemd[1]: Started Ensure /etc/mtab is a symlink to /proc/mounts.
[   19.367148] systemd[1]: Started Apply Kernel Variables.
[   19.985039] systemd[1]: Started Journal Service.
[   20.178207] systemd-journald[311]: Received request to flush runtime journal from PID 1
[   21.996298] ifquery[398]: segfault at 1 ip 0000000000403187 sp 00007ffe89165ae0 error 4 in ifup[400000+d000]
[   22.307144] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 123648
[   22.307147] Policy zone: DMA32
[   22.307577] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 123648
[   22.307578] Policy zone: Normal
[   22.581164] ppdev: user-space parallel port driver
[   22.698923] Console: switching to colour frame buffer device 100x37
[   23.563917] audit: type=1400 audit(1452450431.440:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default" pid=495 comm="apparmor_parser"
[   23.564250] audit: type=1400 audit(1452450431.444:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-mounting" pid=495 comm="apparmor_parser"
[   23.564489] audit: type=1400 audit(1452450431.444:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-nesting" pid=495 comm="apparmor_parser"
[   23.704146] audit: type=1400 audit(1452450431.584:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/sbin/dhclient" pid=509 comm="apparmor_parser"
[   23.704437] audit: type=1400 audit(1452450431.584:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=509 comm="apparmor_parser"
[   23.704661] audit: type=1400 audit(1452450431.584:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=509 comm="apparmor_parser"
[   23.704877] audit: type=1400 audit(1452450431.584:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=509 comm="apparmor_parser"
[   23.712485] audit: type=1400 audit(1452450431.592:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=510 comm="apparmor_parser"
[   23.786342] audit: type=1400 audit(1452450431.664:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=512 comm="apparmor_parser"
[   27.796134] xenbus_probe_frontend: Waiting for devices to initialise: 25s...20s...15s...10s...5s...0s...

[   52.708683] xenbus_probe_frontend: Timeout connecting to device: device/vfb/0 (local state 3, remote state 1)
[   54.163590] cgroup: new mount options do not match the existing superblock, will be ignored
[   54.562406] ip_tables: (C) 2000-2006 Netfilter Core Team
[   54.683257] bridge: automatic filtering via arp/ip/ip6tables has been deprecated. Update your scripts to load br_netfilter if you need this.
[   55.182401] nf_conntrack version 0.5.0 (3897 buckets, 15588 max)
[   57.811546] audit: type=1400 audit(1452450465.685:11): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/bin/lxc-start" pid=828 comm="apparmor_parser"
[   57.815158] audit: type=1400 audit(1452450465.689:12): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default" pid=832 comm="apparmor_parser"
[   57.815165] audit: type=1400 audit(1452450465.689:13): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default-with-mounting" pid=832 comm="apparmor_parser"
[   57.815169] audit: type=1400 audit(1452450465.689:14): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default-with-nesting" pid=832 comm="apparmor_parser"
[   73.966169] EXT4-fs (xvda1): resizing filesystem from 2094474 to 3930147 blocks
[   76.050816] EXT4-fs (xvda1): resized filesystem to 3930147

root@ip-172-31-5-83:/home/ubuntu# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.10
DISTRIB_CODENAME=wily
DISTRIB_DESCRIPTION="Ubuntu 15.10"

Revision history for this message

mm (mtl-0) wrote on 2016-01-10:

#51

I really tried hard DD'ing and filling up RAM but I really can not reproduce the issue under Ubuntu 15.04 on AWS EC2 nano. On my home server boxes that issue also didn't appear on Ubuntu 15.04 either - so I really think it was introduced between 15.04 and 15.10.

Linux ip-172-31-7-84 3.19.0-20-generic #20-Ubuntu SMP Fri May 29 10:10:47 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

root@ip-172-31-7-84:/home/ubuntu# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.04
DISTRIB_CODENAME=vivid
DISTRIB_DESCRIPTION="Ubuntu 15.04"

top - 19:17:38 up 45 min, 2 users, load average: 3.52, 2.91, 1.93
Tasks: 87 total, 3 running, 84 sleeping, 0 stopped, 0 zombie
%Cpu(s): 36.4 us, 13.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 50.3 st
KiB Mem: 499128 total, 492924 used, 6204 free, 8552 buffers
KiB Swap: 1048572 total, 18664 used, 1029908 free. 136340 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11405 root 20 0 127120 124140 1360 R 36.6 24.9 3:19.84 memtester
11425 root 20 0 168080 165052 1340 R 36.0 33.1 3:09.11 memtester
11797 root 20 0 7012 2724 1660 D 16.6 0.5 0:54.64 dd
12110 root 20 0 7012 2792 1728 D 7.7 0.6 0:01.19 dd
1190 root 20 0 0 0 0 D 2.0 0.0 0:00.98 kworker/u30+
   32 root 20 0 0 0 0 S 1.0 0.0 0:01.62 kswapd0
    1 root 20 0 35032 2020 1476 S 0.0 0.4 0:01.68 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
    3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
    5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
    7 root 20 0 0 0 0 S 0.0 0.0 0:00.67 rcu_sched
    8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh

[...]

Total DISK READ : 3.88 K/s | Total DISK WRITE : 67.48 M/s
Actual DISK READ: 3.88 K/s | Actual DISK WRITE: 65.99 M/s
  TID PRIO USER DISK READ DISK WRITE> SWAPIN IO COMMAND
12110 be/4 root 0.00 B/s 67.48 M/s 0.00 % 86.28 % dd if=/de~0096 bs=1M
    1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
    2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
    3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
    5 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H]
    7 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_sched]
    8 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_bh]

[...]

I really tried hard DD'ing and filling up RAM but I really can not reproduce the issue under Ubuntu 15.04 on AWS EC2 nano. On my home server boxes that issue also didn't appear on Ubuntu 15.04 either - so I really think it was introduced between 15.04 and 15.10.

Linux ip-172-31-7-84 3.19.0-20-generic #20-Ubuntu SMP Fri May 29 10:10:47 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

root@ip-172-31-7-84:/home/ubuntu# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.04
DISTRIB_CODENAME=vivid
DISTRIB_DESCRIPTION="Ubuntu 15.04"

top - 19:17:38 up 45 min,  2 users,  load average: 3.52, 2.91, 1.93
Tasks:  87 total,   3 running,  84 sleeping,   0 stopped,   0 zombie
%Cpu(s): 36.4 us, 13.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si, 50.3 st
KiB Mem:    499128 total,   492924 used,     6204 free,     8552 buffers
KiB Swap:  1048572 total,    18664 used,  1029908 free.   136340 cached Mem

PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
11405 root      20   0  127120 124140   1360 R 36.6 24.9   3:19.84 memtester
11425 root      20   0  168080 165052   1340 R 36.0 33.1   3:09.11 memtester
11797 root      20   0    7012   2724   1660 D 16.6  0.5   0:54.64 dd
12110 root      20   0    7012   2792   1728 D  7.7  0.6   0:01.19 dd
 1190 root      20   0       0      0      0 D  2.0  0.0   0:00.98 kworker/u30+
   32 root      20   0       0      0      0 S  1.0  0.0   0:01.62 kswapd0
    1 root      20   0   35032   2020   1476 S  0.0  0.4   0:01.68 systemd
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0   0:00.00 ksoftirqd/0
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H
    7 root      20   0       0      0      0 S  0.0  0.0   0:00.67 rcu_sched
    8 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcu_bh

[...]

Total DISK READ :       3.88 K/s | Total DISK WRITE :      67.48 M/s
Actual DISK READ:       3.88 K/s | Actual DISK WRITE:      65.99 M/s
  TID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO    COMMAND
12110 be/4 root        0.00 B/s   67.48 M/s  0.00 % 86.28 % dd if=/de~0096 bs=1M
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    5 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
    7 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
    8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]

[...]

Revision history for this message

Peet (peet44) wrote on 2016-01-10:

#52

This bug kinda makes me curious. I tried to reproduce this issue on a vm but I couldn't get kswapd0 into the 100% cpu state.
Isn't it possible to use kernel's lockdep feature to automatically detect the issue if it is an software lock or something like that?

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-10:

#53

For clarity here: I have not tested 15.04 and I can entirely believe that the issue doesn't appear on that version. I'm testing the specified kernel versions on Ubuntu 15.10 and the issue is reliably reproducible. It doesn't seem to depend on whether swap is enabled or how full the RAM is; large (~256MB) file writes are sufficient to provoke it. You'll see kswapd0 CPU usage spiking while writing large files; it doesn't always get stuck on 100%, but it's more likely to if the system has other disk activity (run another dd at the same time, grep a bunch of files, etc).

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-01-11:

#54

I've tested 15.10 with different versions, including mainline versions, and I can reproduce the bug on the following version: 4.4.0-040400-generic, 4.2.0-23-generic, 4.0.1-040001-generic, 3.19.8-031908-generic, 3.19.0-43-generic.

As the latest 15.04 also gives the symptoms, I suspect it's not caused by a specific kernel versions. After downgrading and upgrading different packages between 15.04 and 15.10, I've narrowed it down to udev.

Could you all please try downgrading udev (and libudev1) to version 219-7ubuntu6, and see if the symptoms disappear. Note that you have to restart after the downgrade.

Revision history for this message

mm (mtl-0) wrote on 2016-01-11:

#55

Thanks a lot for your hard work!
I tested it with Amazon EC2 15.10 and it seems that downgrading fixed it.
Now testing my Xubuntu Home Server.

Revision history for this message

mm (mtl-0) wrote on 2016-01-11:

#56

Seems also to be fixed on my Xubuntu Home Servers. But I will report back later.

Revision history for this message

mm (mtl-0) wrote on 2016-01-11:

#57

Update: Sadly not fixed for my Home Servers:

top - 19:19:36 up 45 min, 1 user, load average: 3,95, 3,81, 3,46
Aufgaben: 175 total, 4 running, 171 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1,0 be, 44,8 sy, 1,8 ni, 14,3 un, 37,6 wa, 0,0 hi, 0,5 si, 0,0 st
KiB Mem: 1872072 total, 1847332 used, 24740 free, 7704 buffers
KiB Swap: 5130332 total, 75924 used, 5054408 free. 1048580 cached Mem

  PID BENUTZER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
   39 root 20 0 0 0 0 R 92,4 0,0 35:48.75 kswapd0
2243 root 20 0 0 0 0 S 5,8 0,0 0:29.69 kworker/u4:1
2315 root 39 19 2367016 455472 6272 S 5,8 24,3 1:44.55 java
    1 root 20 0 37832 3808 1828 S 0,0 0,2 0:01.92 systemd
    2 root 20 0 0 0 0 S 0,0 0,0 0:00.00 kthreadd
    3 root 20 0 0 0 0 S 0,0 0,0 0:04.14 ksoftirqd/0
    5 root 0 -20 0 0 0 S 0,0 0,0 0:00.00 kworker/0:0H
    7 root 20 0 0 0 0 S 0,0 0,0 0:02.52 rcu_sched
    8 root 20 0 0 0 0 S 0,0 0,0 0:00.00 rcu_bh
    9 root 20 0 0 0 0 R 0,0 0,0 0:02.38 rcuos/0
   10 root 20 0 0 0 0 S 0,0 0,0 0:00.00 rcuob/0

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-11:

#58

I can't reproduce with the older version of udev either. However, I think this is still a kernel issue, apparently triggered by an interaction with a recent change in udev. Finding the change to udev that causes this could be instructive for figuring out what the underlying kernel issue is, but I disagree that this isn't related to specific kernel versions - as demonstrated above I have the point at which this starts happening to within a few revisions of the kernel, with the same version of udev.

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-01-11:

#59

Agreed. That means we may have two possible leads for finding the culprit.

Are there any seasoned kernel and systemd/udev persons that could look at the changelog and make a guess at which commits we should test? The alternative is to continue to bisect the kernel and also start bisecting udev.

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-13:

#60

ftrace of kswapd0 exhibiting issue Edit (1.1 MiB, text/plain)

So, I've had an exciting evening armed with ftrace and a kernel debugger. I'm questioning my own sanity a bit here, but I'm fairly sure at this point that setting ftrace going and loading the kernel debugging module (I'm using kgdboe here because I'm on AWS) stop whatever the issue is from happening - kswapd0's CPU usage doesn't even increase with them enabled, never mind run out of control. However, it *is* possible to trigger issue and then enable ftrace or load the debug module and break in. I haven't found the debugger super useful, though I did manage to break into kswapd0 while it was doing things at one point, so it's potentially an option if necessary. ftrace looked more useful; I've attached a trace of an arbitrary time slice taken with the issue happening. Happy to do extra work with either if these if anyone has further ideas.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2016-01-14:

#61

The bisect indicated the following commit as the first bad commit:
commit 519192aaae38e24d6b32d3d55d791fe294981185
Author: Thomas Huth <email address hidden>
Date: Mon Sep 9 17:32:56 2013 +0200

KVM: Add documentation for kvm->srcu lock

However, this commit is only a documentation change, so it's probably not the real cause of the issue. We may need to go back through and test all the bisect test kernels again to see if we marked good or bad incorrectly.

Revision history for this message

Sam Lade (sam-sentynel) wrote on 2016-01-14:

#62

I found an issue with the image I've been cloning test VMs from which may have affected test results. I've got my own kernel builds set up now and am rerunning the bisect. Will let you know what I find.

Revision history for this message

Marten van Wezel (g-u5untu-y) wrote on 2016-01-15:

#63

FWIW this affects me too. I installed pretty much exactly the same installation (Ubuntu + Kodi + Steam) on two Mac minis, two days ago.

You can compare the devices if you wish - MacMini 4,1 and 5,1 respectively. 4,1 works fine, 5,1 does not. https://en.wikipedia.org/wiki/Mac_Mini#Specifications_3

Notable differences: Intel HD 3000 video on the broken 5,1 device, NVidia GT320M on working 4,1 device. Some other things as well.

Also, it seemed to coincide a few times with 'video card state changes', ie. screen saving on, etc. Could be total coincidence

Revision history for this message

Marten van Wezel (g-u5untu-y) wrote on 2016-01-16:

#64

As an addendum, is there a quick-fix for this? Meaning: can I revert to kernel 3.xsomething easily in Ubuntu? Or are there all types of unwelcome side effects to dropping from new-hotness to old-coldness, kernel-wise?

signed, pro linux sysadmin for many years, back in the 2000's, now only recently gotten back into it for only my home purposes, so I'm completely behind on how kernel stuff fits together with things. (where are my /dev/ devices, sob).

Revision history for this message

Sean Groarke (sgroarke) wrote on 2016-01-16:

#65

Marten

Not a fix, but a workaround. (Well, of course every server is different - but works good for me for now....!)

Create in /etc/cron.hourly/ something like kswapd.
chmod +x kswapd

Then into it drop:
echo 1 > /proc/sys/vm/drop_caches

You'll get a lot of
[649613.095072] kswapd (29670): drop_caches: 1
in your dmesg/syslog.

But for me, this works great. Of course it depends on the bug not occurring within the first hour of the "fix" being applied (i.e. cron.hourly). And clearly this will vary from box to box. But on my AWS server (low traffic web-server with a few other odds and sods on it) this has effectively "solved" this issue enough, until we get a full fix.

Sean

Revision history for this message

Marten van Wezel (g-u5untu-y) wrote on 2016-01-16:

#66

Egads, the thick plottens... turns out I have some type of error in my system. In particular:

marten@MacMini-New:/usr/local/bin$ cat /proc/meminfo |grep MemTotal
MemTotal: 859916 kB

Yes you read that right, 860Mb. That's clearly incorrect. I'll go and see if I can put in something better. For this thread though this might be a good hint on how to debug things.

Sean- To your point - Yeah I have a 'fixkswapd' script that says:

marten@MacMini-New:/usr/local/bin$ cat fixkswapd
schedtool -D -n 19 `pidof kswapd0`
echo 1 > /proc/sys/vm/drop_caches

Maybe my system has an *actual* memory problem rather than a bug. I'll report back if/when I fix this.

Revision history for this message

Marten van Wezel (g-u5untu-y) wrote on 2016-01-16:

#67

Instant-update:
So indeed I had a broken memory module. I had installed 2x1GB, but only 1 showed. (the 860 is due to the video controller using 200-odd MB system RAM)

Now back up and running with 4GB . Will report back if I encounter the issue again (I haven't put the crontab fix in place so I'll be able to notice the issue if it happens, if not, perhaps my stumbling about might've given you guys a few datapoints.. perhaps broken ram is the cause, or perhaps ultra-low ram.

Revision history for this message

Dennis Stevense (decafdennis) wrote on 2016-01-17:

#68

A workaround that seems to work for me (and one that doesn't involve dropping caches periodically) is to downgrade udev to 219-7ubuntu6 (as suggested by Øystein above). On 15.10 wily, add vivid-updates to your apt sources to be downgrade to this version of udev.

I noticed this bug is more likely to occur on servers with less memory / more memory pressure. I was only able to reproduce it on our t2.micro and t2.small instances on EC2, and using larger EC2 instances was our first workaround. (In fact, after downgrading udev I was also downgrade to smaller EC2 instances again.)

This bug also seems more likely to occur on servers with faster disk I/O, given I was only able to reproduce it on instances with gp2 (SSD) storage and not on standard (magnetic) storage on EC2.

Our use case is a bunch of EC2 instances all using the Ubuntu 15.10 wily AMI, and currently running the latest 4.2.0-23-generic kernel.

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-01-17:

#69

I've bisected systemd and found the commit that triggers the bug. http://bazaar.launchpad.net/~thopiekar/systemd/systemd-packaging-ubuntu-wily/revision/1706 seems to be the one. The commit is a bugfix that effectively enables hotadd for Xen. That can explain why more or less only Ubuntu 15.10 is affected, but the underlying mechanism is in the kernel.

A working workaround on the latest 15.10 is to comment out line 2 (ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply") in /lib/udev/rules.d/40-vm-hotadd.rules

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-01-17:

#70

Could someone with good udev/kernel knowledge try to debug what's going wrong with this hotadd? I've tried to run systemd-udevd in debug mode, but it didn't show anything of interest. https://wiki.ubuntu.com/DebuggingUdev is outdated, and I don't know how to go on with debugging.

Revision history for this message

Nelson Elhage (nelhage) wrote on 2016-01-17:

#71

I did some experimentation, and I can reproduce @oystein-gisnas's result that nuking that file from `/lib/udev/rules.d/` and rebooting fixes the issue.

However, removing that file and restarting udev does *not* seem to fix the issue. So I suspect the problem is not with udev, but rather with some kernel feature that udev is enabling or toggling as part of processing that file.

Also, as an aside, editing that file isn't a good workaround, since upgrading udev may clobber it again. If you instead create an `/etc/udev/rules.d/40-vm-hotadd.rules` containing the edited version, that file will take precedence over the one in `/lib`, and also be robust against udev upgrades.

Revision history for this message

Dennis Stevense (decafdennis) wrote on 2016-01-17:

#72

Confirmed Nelson Elhage's workaround: copy /lib/udev/rules.d/40-vm-hotadd.rules to /etc/udev/rules.d/ and comment out line 2 and reboot.

Revision history for this message

Nelson Elhage (nelhage) wrote on 2016-01-17:

#73

Some further debugging: On my t2.micro test case, the machine comes up with 9 memory devices, the last of which is offline at boot. Bringing that device online by hand (with the udev rules disabled) triggers the bug.

# echo online > /sys/devices/system/memory/memory8/state

So something about bringing that node online causes the problem. The udev bugfix causes udev to automatically bring it up, which is why that introduced the issue, but it doesn't seem like udev is really the root problem here.

Revision history for this message

In Linux Kernel Bug Tracker #65201, serianox (serianox-linux-kernel-bugs) wrote on 2016-01-19:

#162

same problem here, c720p chromebook , happens on several different distros like arch, ubuntu, xubuntu. I downgraded to the 4.1.x kernel and the issue is less frequent (needs much more memory pressure to trigger). then I downgraded to the 3.17 kernel and the issue is gone completely. all the previous suggestions and workarrounds didn't work for me. only downgrading the kernel did.

Joseph Salisbury (jsalisbury) on 2016-01-19

Changed in linux (Ubuntu):
status:	In Progress → Confirmed
assignee:	Joseph Salisbury (jsalisbury) → nobody

Revision history for this message

Nelson Elhage (nelhage) wrote on 2016-01-19:

#74

One more note to anyone else trying to debug this: I can reproduce quite reliably by copying a 3GiB file from S3 onto a gp2 EBS volume using `aws s3 cp`.

Revision history for this message

mm (mtl-0) wrote on 2016-01-24:

#75

As part of my workaround/fix I downgraded udev and upgraded RAM from 3GB to 8GB on my homeserver.

Now I am experiencing the following which can't be normal either:

top - 13:50:48 up 1 day, 19:29, 1 user, load average: 0,11, 0,15, 0,14
tasks: 200 total, 1 running, 198 sleeping, 0 stopped, 1 zombie
%Cpu(s): 6,1 be, 2,5 sy, 0,2 ni, 81,6 un, 9,1 wa, 0,0 hi, 0,4 si, 0,0 st
KiB Mem: 8067248 total, 4768336 used, 3298912 free, 477524 buffers
KiB Swap: 4194300 total, 4170948 used, 23352 free. 2803672 cached Mem

Filename Type Size Used Priority
/var/cache/swap/swap0 file 4194300 4170948 -1

Linux ##### 4.2.0-25-generic #30-Ubuntu SMP Mon Jan 18 12:31:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Any idea? I hope the system won't crash ....

Revision history for this message

In Linux Kernel Bug Tracker #65201, liststuff (liststuff-linux-kernel-bugs) wrote on 2016-02-09:

#163

Same problem here on Acer C720 Chromebook. I have 2GB of swap space on the SSD (I replaced the original 16GB M2 SSD with a 256GB version) and whenever swap is used I get this problem.

Linux localhost 4.2.0-27-generic #32-Ubuntu SMP Fri Jan 22 04:49:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 15.10
Release: 15.10
Codename: wily

echo 3 > /proc/sys/vm/drop_caches # 1 isn't enough works around the issue for me too

Revision history for this message

In Linux Kernel Bug Tracker #65201, sakhnik (sakhnik-linux-kernel-bugs) wrote on 2016-02-09:

#164

I didn't suffer from the bug since compiled kernel myself: https://aur.archlinux.org/packages/linux-c720/ . Apparently, I compiled out something causing the trouble, but I didn't try to bisect what was the culprit.

Revision history for this message

In Linux Kernel Bug Tracker #65201, serianox (serianox-linux-kernel-bugs) wrote on 2016-02-09:

#165

(In reply to Anatoli Sakhnik from comment #11)
> I didn't suffer from the bug since compiled kernel myself:
> https://aur.archlinux.org/packages/linux-c720/ . Apparently, I compiled out
> something causing the trouble, but I didn't try to bisect what was the
> culprit.

This bug seems to affect 2Gb models only. Do you have the 2Gb or 4Gb version? What are the changes you made on your kernel?

Revision history for this message

In Linux Kernel Bug Tracker #65201, sakhnik (sakhnik-linux-kernel-bugs) wrote on 2016-02-09:

#166

Mine is 2G. I didn't change anything in the kernel source code, but switched off many options in the config file: https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 .

Even today, if I boot stock arch kernel, the bug regresses; if I boot linux-c720, kswapd0 is still. In theory, I could experiment with different configurations in between stock's and mine to triage the issue.

Revision history for this message

In Linux Kernel Bug Tracker #65201, serianox (serianox-linux-kernel-bugs) wrote on 2016-02-09:

#167

perhaps you removed something related to
http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html ?
also relevant:
https://github.com/GalliumOS/galliumos-distro/issues/52#issuecomment-174261443

Revision history for this message

In Linux Kernel Bug Tracker #65201, sakhnik (sakhnik-linux-kernel-bugs) wrote on 2016-02-09:

#168

I have no idea yet.

Revision history for this message

In Linux Kernel Bug Tracker #65201, ponymarzanna (ponymarzanna-linux-kernel-bugs) wrote on 2016-02-10:

#169

To avoid this bug I installed ChromeOS on my C720 (with 2GB RAM). I was happy with performance. Until today. I noticed lags. For some reason this bug appeared suddenly. There was no update. Kernel version is 3.8.11. Stock ChromeOS kernel.

Revision history for this message

In Linux Kernel Bug Tracker #65201, serianox (serianox-linux-kernel-bugs) wrote on 2016-02-14:

#170

(In reply to Anatoli Sakhnik from comment #13)
> Mine is 2G. I didn't change anything in the kernel source code, but switched
> off many options in the config file:
> https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 .
>
> Even today, if I boot stock arch kernel, the bug regresses; if I boot
> linux-c720, kswapd0 is still. In theory, I could experiment with different
> configurations in between stock's and mine to triage the issue.

could you please share your configuration for the kernel so I can try your AUR package and solve this issue once for all :) ? thanks in advance

Revision history for this message

In Linux Kernel Bug Tracker #65201, sakhnik (sakhnik-linux-kernel-bugs) wrote on 2016-02-14:

#171

There it is: https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720

Revision history for this message

In Linux Kernel Bug Tracker #65201, jonathan (jonathan-linux-kernel-bugs) wrote on 2016-02-15:

#172

We encounter this regularly on AWS, but only on t2.small instances, which indeed are the only ones we run which have 2GB of RAM.

We use the latest Ubuntu 15.10 AMIs as found here https://cloud-images.ubuntu.com/locator/ec2/. Please let me know if we can do anything to help track this down.

Revision history for this message

JockeTF (jocketf) wrote on 2016-02-17:

#76

I'm seeing this issue in Ubuntu Xenial Xerus (development branch).

Revision history for this message

In Linux Kernel Bug Tracker #65201, liststuff (liststuff-linux-kernel-bugs) wrote on 2016-02-21:

#173

The workaround suggested above (echo 3 > /proc/sys/vm/drop_caches) doesn't work consistently for me on kernel 4.2.0 (Ubuntu 15.10) on an Acer C720 Chromebook.

I've found another workaround that works well for me so far: create a file /etc/sysctl.d/60-workaround-kswapd-allcpu.conf with the following contents and reboot:
vm.min_free_kbytes=67584

The idea behind this workaround is a post by Kirill A. Shutemov on LKML (http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html) and this Gallium OS bug report: https://github.com/GalliumOS/galliumos-distro/issues/52

Would be interesting to know if this helps others

Revision history for this message

In Linux Kernel Bug Tracker #65201, sgnn7 (sgnn7-linux-kernel-bugs) wrote on 2016-03-04:

#174

Download full text (6.6 KiB)

Same problem here:
- No swap machine
- Wily (U15.10) - 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
- 1GB RAM

- `meminfo` - Should have enough RAM to not swap though buffers do seem high

MemTotal: 1014932 kB
MemFree: 231296 kB
MemAvailable: 871180 kB
Buffers: 580684 kB
Cached: 47812 kB
SwapCached: 0 kB
Active: 547952 kB
Inactive: 164364 kB
Active(anon): 84280 kB
Inactive(anon): 4288 kB
Active(file): 463672 kB
Inactive(file): 160076 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 224 kB
Writeback: 0 kB
AnonPages: 83800 kB
Mapped: 39688 kB
Shmem: 4768 kB
Slab: 48008 kB
SReclaimable: 31172 kB
SUnreclaim: 16836 kB
KernelStack: 1936 kB
PageTables: 3844 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 507464 kB
Committed_AS: 314640 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 13524 kB
VmallocChunk: 34359717628 kB
HardwareCorrupted: 0 kB
AnonHugePages: 49152 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 53248 kB
DirectMap2M: 1126400 kB

- kernel config: https://gist.github.com/sgnn7/cbb41ce21d3a927eca27

- strace shows nothing interesting

- `perf` report:
Samples: 12K of event 'cpu-clock', Event count (approx.): 3245250000
Overhead Command Shared Object Symbol
  19.34% kswapd0 [kernel.kallsyms] [k] shrink_lruvec
  17.04% kswapd0 [kernel.kallsyms] [k] mem_cgroup_iter
   8.60% kswapd0 [kernel.kallsyms] [k] mem_cgroup_zone_lruvec
   6.57% kswapd0 [kernel.kallsyms] [k] shrink_slab
   5.47% kswapd0 [kernel.kallsyms] [k] global_dirty_limits ...

Same problem here:
- No swap machine
- Wily (U15.10) - 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
- 1GB RAM

- `meminfo` - Should have enough RAM to not swap though buffers do seem high

MemTotal:        1014932 kB
MemFree:          231296 kB
MemAvailable:     871180 kB
Buffers:          580684 kB
Cached:            47812 kB
SwapCached:            0 kB
Active:           547952 kB
Inactive:         164364 kB
Active(anon):      84280 kB
Inactive(anon):     4288 kB
Active(file):     463672 kB
Inactive(file):   160076 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               224 kB
Writeback:             0 kB
AnonPages:         83800 kB
Mapped:            39688 kB
Shmem:              4768 kB
Slab:              48008 kB
SReclaimable:      31172 kB
SUnreclaim:        16836 kB
KernelStack:        1936 kB
PageTables:         3844 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      507464 kB
Committed_AS:     314640 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       13524 kB
VmallocChunk:   34359717628 kB
HardwareCorrupted:     0 kB
AnonHugePages:     49152 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       53248 kB
DirectMap2M:     1126400 kB

- kernel config: https://gist.github.com/sgnn7/cbb41ce21d3a927eca27

- strace shows nothing interesting

- `perf` report:
Samples: 12K of event 'cpu-clock', Event count (approx.): 3245250000                                                                                                                                                                               
Overhead  Command  Shared Object      Symbol                                                                                                                                                                                                       
  19.34%  kswapd0  [kernel.kallsyms]  [k] shrink_lruvec                                                                                                                                                                                            
  17.04%  kswapd0  [kernel.kallsyms]  [k] mem_cgroup_iter                                                                                                                                                                                          
   8.60%  kswapd0  [kernel.kallsyms]  [k] mem_cgroup_zone_lruvec                                                                                                                                                                                   
   6.57%  kswapd0  [kernel.kallsyms]  [k] shrink_slab                                                                                                                                                                                              
   5.47%  kswapd0  [kernel.kallsyms]  [k] global_dirty_limits                                                                                                                                                                                      
   4.18%  kswapd0  [kernel.kallsyms]  [k] domain_dirty_limits                                                                                                                                                                                      
   3.71%  kswapd0  [kernel.kallsyms]  [k] mem_cgroup_get_lru_size                                                                                                                                                                                  
   3.59%  kswapd0  [kernel.kallsyms]  [k] super_cache_count                                                                                                                                                                                        
   3.27%  kswapd0  [kernel.kallsyms]  [k] get_lru_size                                                                                                                                                                                             
   3.26%  kswapd0  [kernel.kallsyms]  [k] throttle_vm_writeout                                                                                                                                                                                     
   2.20%  kswapd0  [kernel.kallsyms]  [k] css_next_descendant_pre                                                                                                                                                                                  
   2.15%  kswapd0  [kernel.kallsyms]  [k] blk_flush_plug_list                                                                                                                                                                                      
   1.96%  kswapd0  [kernel.kallsyms]  [k] shrink_zone                                                                                                                                                                                              
   1.73%  kswapd0  [kernel.kallsyms]  [k] _raw_spin_lock                                                                                                                                                                                           
   1.59%  kswapd0  [kernel.kallsyms]  [k] __list_lru_count_one.isra.2                                                                                                                                                                              
   1.43%  kswapd0  [kernel.kallsyms]  [k] list_lru_count_one                                                                                                                                                                                       
   1.37%  kswapd0  [kernel.kallsyms]  [k] memcg_kmem_is_active                                                                                                                                                                                     
   1.27%  kswapd0  [kernel.kallsyms]  [k] __raw_callee_save___pv_queued_spin_unlock                                                                                                                                                                
...

I'm going to try gdb, changing swappiness, changing vm.min_free_kbytes, and reducing buffer limits in that order and report back but most likely I'll have one shot before the bug goes away for the next few days.

Revision history for this message

In Linux Kernel Bug Tracker #65201, sgnn7 (sgnn7-linux-kernel-bugs) wrote on 2016-03-04:

#175

Cont'd from previous post

In order of attempts on a live system:
- gdb didn't work at all since kernel wasn't built w/ debugging flags
- hotload of 10 and 0 swappiness (from 60) didn't make the kswapd process reduce cpu usage
- hotload of vm.min_free_kbytes=64K (from 4K) didn't make the process reduce cpu usage
- hotload of vm.dirty_background_ratio=5 (from 10) didn't make the process reduce cpu usage
- hotload of vm.dirty_ratio=10 (from 20) didn't make the process reduce cpu usage
- hotload of vm.dirty_background_ratio=15 (from 5) didn't make the process reduce cpu usage
- hotload of vm.dirty_ratio=25 (from 10) didn't make the process reduce cpu usage
- live swapon on a new 256MB swapfile didn't reduce process use
- live swapoff and swapon after that also didn't drop cpu usage

Sidenote: We're using Docker so I'm not sure if that is contributing to the situation.

Revision history for this message

Alexander Mashin (alex-mashin) wrote on 2016-03-05:

#77

I confirm that the bug is present on Amazon t2.nano instance with Ubuntu 15.10, kernel 4.2.0-30-generic, and can only be worked around by commenting out line 2 in /lib/udev/rules.d/40-vm-hotadd.rules.

Revision history for this message

JockeTF (jocketf) wrote on 2016-03-05:

#78

The workaround fixes the issue in Xenial Xerus (development branch) as well.

Revision history for this message

Alexander Mashin (alex-mashin) wrote on 2016-03-07:

#79

I don't know, if the following comment be relevant to this particular bug, but here it is:
* If there is no swap, the kernel should not try to swap. As simple as this. If there is no swap file or partition, and there is kswapd0 in top, if even it doesn't consume 100% of CPU, it is a bug. Sysadmins are entitled to forbid swapping altogether (because it's evil, as on a web server swapping is effectively deny of service), and the most natural way to do so is not to mount a swap partition.
* If there is cached memory, there should be no swapping. Memory must be freed by flushing file buffers first. It is ridiculous that the kerner is generous to filesystems, caching their files, but not to the applications that really need memory, allowing either to swap or fork to fail. At the moment there is a misleading article somewhere in the Ubuntu knowledge base that claims that cached memory is not "eaten" and is as available as free. As long as the kernel is allowed to swap when there are file buffers, the latter are NOT free and available.

Revision history for this message

In Linux Kernel Bug Tracker #65201, cdlscpmv (cdlscpmv-linux-kernel-bugs) wrote on 2016-03-08:

#176

Good news! I was able to get rid of the bug completely by setting the `mem` kernel parameter to a value slightly less than physical memory. I own an Acer C720 (2GB model), and setting `mem=1920M` does the job.

The idea sprung up in my head after reading the aforementioned bug report on github[1]. I hope this might give some clue to the issue.

[1]: https://github.com/GalliumOS/galliumos-distro/issues/52

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#177

Created attachment 208411
ftrace (function_graph)

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#178

Created attachment 208421
ftrace (vmscan tracepoints)

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#179

Created attachment 208431
/proc/vmstat (time 0)

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#180

Created attachment 208441
/proc/vmstat (time 5s)

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#181

Created attachment 208451
/proc/zoneinfo

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#182

Created attachment 208461
/proc/pagetypeinfo

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#183

Created attachment 208471
/proc/buddyinfo

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#184

Created attachment 208481
vmstat -m (time 0)

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#185

Created attachment 208491
vmstat -m (time 5s)

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#186

I am able to semi-reliably reproduce this (or very similar?) problem on a setup very close to one in comment #21

- kernel: 4.2.0-30-generic (ubuntu 15.10)
- 2 GB RAM, 1 CPU, running under Xen (EC2 t2.small instance)
- docker with LVM thin-pool storage backend, running 3 containers, no memory limits set for their memcg's
- server is mostly idling (load average 0.0-0.1)

To reproduce it I have to:

1. set vm.overcomit_memory=1
2. initiate some disk activity:
find -xdev / -type f |xargs -P10 -n1 md5sum &>/dev/null &
find /var/lib/docker -type f |xargs -P10 -n1 md5sum &>/dev/null &

3. run some memory allocations until you hit OOM
for x in {1..200}; do ./memalloc & : ; done

memalloc above is a simple C program which allocates 100MB and memsets it with 'x':

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
int block_mb = 100;
char *buf;

  printf("allocing %dMB: ", block_mb);
  buf = malloc(block_mb * 1024 * 1000);
  if (! buf) {
    printf("FAILED!\n");
    exit(EXIT_FAILURE);
  }
  printf("ok\n");
  memset(buf, 'x', block_mb * 1024 * 1000);
  sleep(180);
  return 0;
}

once you hit OOM, console slows down, it is time to CTRL+C, pkill memalloc and then check top. many times it spins `kswapd0` then recovers within tens of seconds, but once in a while it stays there for hours (didn't have patience to check for longer).

Once I triggered bug, I tried to get as much information as possible from running system. I am attaching /proc/*info files (some taken 5 s apart), ftrace outputs for event tracer (vmscan events only), ftrace output for function_graph tester. Let me know if you need more information.

To recover from situation need to free enough memory in a short period of time, sometime dropping caches helps, sometimes needed to close applications/containers as well, but never had to reboot to recover.

Revision history for this message

In Linux Kernel Bug Tracker #65201, ivanov.maxim (ivanov.maxim-linux-kernel-bugs) wrote on 2016-03-09:

#187

It would be very helpful if there was a way to get output similar to ftrace function_graph tracer, but with function args and return values, but from the look of it, `pgdat_balance` for some reason keeps returning false even that /proc/zoneinfo shows that number of free pages is much higher than any watermark.

Problem description and recovery method very closely resembles discussion around kernel 3.7 (https://lkml.org/lkml/2012/11/28/88):

> The zonelist reclaim in kswapd would do
> nothing because all high watermarks are met, but the compaction logic
> would find its own requirements unmet and loop over the zones again.
> Indefinitely, until some third party would free enough memory to help
> meet the higher compaction watermark.

Revision history for this message

Greg Fefelov (gfv) wrote on 2016-03-10:

#80

Alexander,

please note that if a kernel process name contains "swap" as a substring, it does not immediately mean that this process exclusively does process memory swap-in/swap-out. kswapd is a kernel process name for an important piece of memory management subsystem: it frees pages by flushing them to disk or discarding when the system is low on memory, saving you the trouble of OOMs. This includes both buffer cache and process memory; however, disk buffers are freed first.

Generally that means that on a system with disabled swap you _will_ see high cpu% in kswapd when you have almost no memory and buffers have not been synced to disk yet. This is not a bug (but you should consider installng more memory or having a stricter sync policy).

However, this discussion is very off topic, since this bug is not an intended behavior.

Revision history for this message

Reupen Shah (reupen) wrote on 2016-03-20:

#81

This has also been affecting me on a t2.micro EC2 instance with Ubuntu 15.10, most recently occurred with:

4.2.0-30-generic #36-Ubuntu SMP Fri Feb 26 00:58:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

For me it often triggers when installing updates (probably kernel ones in particular). Otherwise the server is not doing much so probably not much of a surprise it doesn't trigger on other occasions.

Revision history for this message

Islam (islam) wrote on 2016-04-12:

#82

This is also affecting Ubuntu 15.10 Kernel 4.2.0-35-generic
I have 700MB free RAM 400 MB Cached and kswapd0 is taking more than 99% of the CPU.

Revision history for this message

antgel (antgel) wrote on 2016-04-20:

#83

After reading 82 comments, I'm not sure if this is a kernel or udev (or other) bug. Any clues if anyone's working on this in relevant upstreams?

Revision history for this message

mallardquacken (mallard) wrote on 2016-04-23:

#84

confirmed on released version of 16.04 LTS desktop
# uname -a
Linux laptop 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message

mm (mtl-0) wrote on 2016-04-23:

#85

I can also confirm the bug ist still present in Xubuntu 16.04 LTS desktop

4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message

mm (mtl-0) wrote on 2016-04-27:

#86

I can confirm this bug is only triggered when the machine got 2-3GB RAM or less. I got two identical machines running Ubuntu 16.04. One got 8Gb of RAM and on this machine kswapd0 doesn't deadlock at 100% CPU Usage. On my 2GB RAM machine the issue is worse than under Ubuntu 15.10 ....

Revision history for this message

Andreas E. (andreas-e) wrote on 2016-04-27:

#87

In the earliest linked report of this bug the deadlock occured also with 8GB RAM (and still the case in 15.10). Will test 16.04 later when it stabilizes.

Revision history for this message

erik (erik-eriksson) wrote on 2016-04-28:

#88

I can see this bug on a system recently upgraded from Ubuntu 15.04 to 16.04.
I did not see this behaviour before upgrading.
The system is an Intel NUC Desktop with 8 GB RAM and kswapd0 completely locks up the system after a while.

Revision history for this message

In Linux Kernel Bug Tracker #65201, hdefendme (hdefendme-linux-kernel-bugs) wrote on 2016-04-30:

#188

(In reply to Anatoli Sakhnik from comment #4)
> My Acer C720 too suffers occasionally. Turning swap on/off doesn't help.
> Dropping caches *does* help:
>
> # echo 3 > /proc/sys/vm/drop_caches # 1 isn't enough
>
> Next my guess would be to try to deactivate zswap.

above work around works for me, kernel 4.4.2 debian jessie.

bug happens randomly after heavy web browsers for kernel 4.5
downgrade to 3.16 stable jessie kernel, bug gone.
upgrade 4.4.2 bug came again

Revision history for this message

Impulse (kristal-plus) wrote on 2016-05-06:

#89

Confirm.
Linux impulse-X55VD 4.6.0-040600rc6-generic #201605012031 SMP Mon May 2 00:33:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message

Gregg King (greggking) wrote on 2016-05-12:

#90

I'm also seeing this on 16.04 on an AWS nano (~0.5GB RAM). More than happy to help troubleshoot this if people let me know what info is needed.

Here's some basic info from my machine:

$ uname -a
4.4.0-22-generic #39-Ubuntu SMP Thu May 5 16:53:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ free -m
total used free shared buff/cache available
Mem: 486 73 140 20 272 362
Swap: 0 0 0

$ top
top - 01:44:48 up 3 days, 2:34, 1 user, load average: 1.00, 1.01, 1.05
Tasks: 131 total, 2 running, 129 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 90.9 sy, 0.0 ni, 9.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 498480 total, 143612 free, 75852 used, 279016 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 370732 avail Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
   29 root 20 0 0 0 0 R 99.9 0.0 4066:36 kswapd0
    1 root 20 0 37908 6028 4048 S 0.0 1.2 0:08.79 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
----- CUT THE REST -----

Revision history for this message

François Leurent (131-ubuntu) wrote on 2016-05-14:

#91

Same issue here, uname -a
4.4.0-22-generic #39-Ubuntu SMP Thu May 5 16:53:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
free -m
total used free shared buff/cache available
Mem: 1836 68 1608 12 159 1722
Swap: 1949 0 1949
(i'll edit this in a few minute, with top result (uptime is 0s)

Revision history for this message

mb (doitnow) wrote on 2016-05-19:

#92

Screenshot from 2016-05-19 15:25:29.png Edit (45.4 KiB, image/png)

Fresh install of 16.04 on EC2 Nano instance, and kswapd0 takes most of the CPU when running a job with high CPU and high IO.

I've resorted to running the following command every minute as a cron task:

# m h dom mon dow command
* * * * * echo 3 > /proc/sys/vm/drop_caches

That works for 10 or 15 seconds, but after that kswapd0 comes back and hogs the CPU again.

I'm using less than 20% of available memory, shouldn't need to swap.

See attached screenshot of top at the 25-second mark after dropping caches.

Revision history for this message

mm (mtl-0) wrote on 2016-05-26:

#94

I can the following hotfix works for me for several days on a 2GB Ram system:

echo "vm.min_free_kbytes=67584" > /etc/sysctl.d/60-workaround-kswapd-allcpu.conf

after that: reboot.

Revision history for this message

mm (mtl-0) wrote on 2016-05-29:

#95

I can confirm after after 6 days uptime: my hotfix (echo "vm.min_free_kbytes=67584" > /etc/sysctl.d/60-workaround-kswapd-allcpu.conf) still works.

Revision history for this message

MintPaw (jerusanders) wrote on 2016-05-30:

#96

I can confirm that setting vm.min_free_kbytes=67584 on my 2gb chromebook does not work for me (https://i.imgur.com/I7vEE5C.png)

After restarting and running heavy processes, kswapd0 still uses 100% cpu until I reboot or run this follow very unfortunate script.

#!/bin/bash
if [ "$EUID" -ne 0 ]; then
echo "Rerunning as root"
sudo $0
exit
fi

swapoff -a
sleep 1
echo 3 > /proc/sys/vm/drop_caches
sleep 1
swapon /dev/sda2

Revision history for this message

Haw Loeung (hloeung) wrote on 2016-05-31:

#97

OOI, does this fix this?

| for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do sudo sh -c "echo performance > $i"; done

If so, was the scaling governor set to "ondemand" by any chance?

Revision history for this message

MintPaw (jerusanders) wrote on 2016-06-01:

#98

They seem to be set to "powersave", which is kinda ironic. I've set them to performance and will report back later.

Revision history for this message

MintPaw (jerusanders) wrote on 2016-06-02:

#99

Nope, setting it to performance doesn't seems to help.

Revision history for this message

MintPaw (jerusanders) wrote on 2016-06-04:

#100

Actually, this is even a problem with no active swap partition. What is the actual problem here? Is kswapd0 just running for no reason?

Revision history for this message

Jonathan Vargas (jvargas-alkaid) wrote on 2016-06-04:

#101

Selection_320.png Edit (237.7 KiB, image/png)

I face the same issue, running a t2.micro (1 GB) for a Nginx/Ruby application with really low demand. The kswapd0 process takes the CPU to 100% and the overall system performance is downgraded.

Linux server 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

To reproduce it, I just create a 1GB file in the filesystem using "dd":

if=/dev/zero of=temp bs=1k count=1024k

I created a swap file to solve the issue but it continues. After dropping the page cache with:

echo 3 > /proc/sys/vm/drop_caches

the issue is solved temporarily.

Revision history for this message

Gregg King (greggking) wrote on 2016-06-04: Re: [Bug 1518457] Re: kswapd0 100% CPU usage

#102

Download full text (4.6 KiB)

I've had the issue mostly when running with no swap. I added 1GB swap file and it fixed it on the machine with less memory usage, but the machine with the database (using more memory most of the time) still has the bug occur about once a day.

Ubuntu 16.04 for both, not happening on 14.04.

Sent from my iPhone

> On Jun 4, 2016, at 5:11 PM, Jonathan Vargas <email address hidden> wrote:
>
> I face the same issue, running a t2.micro (1 GB) for a Nginx/Ruby
> application with really low demand. The kswapd0 process takes the CPU to
> 100% and the overall system performance is downgraded.
>
> Linux server 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> To reproduce it, I just create a 1GB file in the filesystem using "dd":
>
> if=/dev/zero of=temp bs=1k count=1024k
>
> I created a swap file to solve the issue but it continues. After
> dropping the page cache with:
>
> echo 3 > /proc/sys/vm/drop_caches
>
> the issue is solved temporarily.
>
>
> ** Attachment added: "Selection_320.png"
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/+attachment/4677030/+files/Selection_320.png
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1518457
>
> Title:
> kswapd0 100% CPU usage
>
> Status in Linux:
> Unknown
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> As per bug 721896 and various others:
>
> I'm on an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory).
> Occasionally (about once a day), kswapd0 falls into a busy loop and
> spins on 100% CPU usage indefinitely. This can be provoked by
> copying/writing large files (e.g. dding a 256MB file), but it happens
> occasionally otherwise. System memory usage (not including
> buffers/caches) currently sits at 36%, which is typical[1]. Initially
> I had no swap space configured; I've since tried enabling a 256MB swap
> file, but the problem continues to occur and no swap space is used.
> The system can be recovered with `echo 1 > /proc/sys/vm/drop_caches`.
>
> Happy to provide further information/take further debugging actions.
>
>
> [1] Full output from `free`:
> total used free shared buffers cached
> Mem: 1014936 483448 531488 28556 9756 112700
> -/+ buffers/cache: 360992 653944
> Swap: 262140 0 262140
>
> ProblemType: Bug
> DistroRelease: Ubuntu 15.10
> Package: linux-image-4.2.0-18-generic 4.2.0-18.22
> ProcVersionSignature: Ubuntu 4.2.0-18.22-generic 4.2.3
> Uname: Linux 4.2.0-18-generic x86_64
> AlsaDevices:
> total 0
> crw-rw---- 1 root audio 116, 1 Nov 19 19:40 seq
> crw-rw---- 1 root audio 116, 33 Nov 19 19:40 timer
> AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
> ApportVersion: 2.19.1-0ubuntu5
> Architecture: amd64
> ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
> AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
> CRDA: N/A
> Date: Fri Nov 20 20:44:30 2015
> Ec2AMI: ami-1c552a76...

I've had the issue mostly when running with no swap. I added 1GB swap file and it fixed it on the machine with less memory usage, but the machine with the database (using more memory most of the time) still has the bug occur about once a day.

Ubuntu 16.04 for both, not happening on 14.04.

Sent from my iPhone

> On Jun 4, 2016, at 5:11 PM, Jonathan Vargas <1518457@bugs.launchpad.net> wrote:
> 
> I face the same issue, running a t2.micro (1 GB) for a Nginx/Ruby
> application with really low demand. The kswapd0 process takes the CPU to
> 100% and the overall system performance is downgraded.
> 
> Linux server 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
> 
> To reproduce it, I just create a 1GB file in the filesystem using "dd":
> 
> if=/dev/zero of=temp bs=1k count=1024k
> 
> I created a swap file to solve the issue but it continues. After
> dropping the page cache with:
> 
> echo 3 > /proc/sys/vm/drop_caches
> 
> the issue is solved temporarily.
> 
> 
> ** Attachment added: "Selection_320.png"
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/+attachment/4677030/+files/Selection_320.png
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1518457
> 
> Title:
>  kswapd0 100% CPU usage
> 
> Status in Linux:
>  Unknown
> Status in linux package in Ubuntu:
>  Confirmed
> 
> Bug description:
>  As per bug 721896 and various others:
> 
>  I'm on an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory).
>  Occasionally (about once a day), kswapd0 falls into a busy loop and
>  spins on 100% CPU usage indefinitely. This can be provoked by
>  copying/writing large files (e.g. dding a 256MB file), but it happens
>  occasionally otherwise. System memory usage (not including
>  buffers/caches) currently sits at 36%, which is typical[1]. Initially
>  I had no swap space configured; I've since tried enabling a 256MB swap
>  file, but the problem continues to occur and no swap space is used.
>  The system can be recovered with `echo 1 > /proc/sys/vm/drop_caches`.
> 
>  Happy to provide further information/take further debugging actions.
> 
> 
>  [1] Full output from `free`:
>               total       used       free     shared    buffers     cached
>  Mem:       1014936     483448     531488      28556       9756     112700
>  -/+ buffers/cache:     360992     653944
>  Swap:       262140          0     262140
> 
>  ProblemType: Bug
>  DistroRelease: Ubuntu 15.10
>  Package: linux-image-4.2.0-18-generic 4.2.0-18.22
>  ProcVersionSignature: Ubuntu 4.2.0-18.22-generic 4.2.3
>  Uname: Linux 4.2.0-18-generic x86_64
>  AlsaDevices:
>   total 0
>   crw-rw---- 1 root audio 116,  1 Nov 19 19:40 seq
>   crw-rw---- 1 root audio 116, 33 Nov 19 19:40 timer
>  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
>  ApportVersion: 2.19.1-0ubuntu5
>  Architecture: amd64
>  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
>  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
>  CRDA: N/A
>  Date: Fri Nov 20 20:44:30 2015
>  Ec2AMI: ami-1c552a76
>  Ec2AMIManifest: (unknown)
>  Ec2AvailabilityZone: us-east-1d
>  Ec2InstanceType: t2.micro
>  Ec2Kernel: unavailable
>  Ec2Ramdisk: unavailable
>  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
>  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
>  MachineType: Xen HVM domU
>  PciMultimedia:
> 
>  ProcEnviron:
>   TERM=screen
>   PATH=(custom, no user)
>   LANG=en_US.UTF-8
>   SHELL=/bin/bash
>  ProcFB: 0 xen
>  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-18-generic root=UUID=35bc01f4-4602-4823-976e-508edef899df ro console=tty1 console=ttyS0 net.ifnames=0
>  RelatedPackageVersions:
>   linux-restricted-modules-4.2.0-18-generic N/A
>   linux-backports-modules-4.2.0-18-generic  N/A
>   linux-firmware                            N/A
>  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
>  SourcePackage: linux
>  UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
>  UpgradeStatus: No upgrade log present (probably fresh install)
>  dmi.bios.date: 05/06/2015
>  dmi.bios.vendor: Xen
>  dmi.bios.version: 4.2.amazon
>  dmi.chassis.type: 1
>  dmi.chassis.vendor: Xen
>  dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd05/06/2015:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
>  dmi.product.name: HVM domU
>  dmi.product.version: 4.2.amazon
>  dmi.sys.vendor: Xen
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/1518457/+subscriptions

Revision history for this message

MintPaw (jerusanders) wrote on 2016-06-07:

#103

I've fixed this in Arch Linux by moving to the 4.6.1-2-ARCH kernel, when previously on the 4.5.? kernel. This problem also doesn't occur in either the 4.0.7-2 or the 3.16-2 kernels for me at least.

I hear this is happening mostly on the 4.4.0-* kernel, which doesn't seem to be an option in the Arch Linux archive(https://archive.archlinux.org/packages/l/linux/), in fact none of the *.*.0-* kernels are, if you can report a kernel that's in that list I can test it. I have a feeling simply reinstalling any kernel would have fixed it, and I had some sort of configuration problem before.

Revision history for this message

MintPaw (jerusanders) wrote on 2016-06-08:

#104

Whoops, it does seem to happen on 4.6.1-2, although the condition are different. It seems, does anyone have a sure fire repro case?

Revision history for this message

Thorsten von Eicken (tve-rightscale) wrote on 2016-06-08:

#105

Is the ability to repro a hold-up on this issue? I can spend time to provide STRs on an EC2 instance using the latest official 16.04 image, but I don't want to go through the effort if repro isn't really a hold-up.

Revision history for this message

tomtom (tbjornli) wrote on 2016-06-09:

#106

I can confirm that this issue also exist on AWS c4.large (vCPU 2, mem 3,75 GiB).

$ uname -a
Linux server 4.4.0-22-generic #39-Ubuntu SMP Thu May 5 16:53:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ free -m
total used free shared buff/cache available
Mem: 3762 1307 1204 69 1251 2321
Swap: 2047 39 2008

Revision history for this message

Andrew Noruk (p-andrew-s) wrote on 2016-06-09:

#107

I'm also facing the kswap0 99% CPU issue. Typically, I notice it start minutes after boot time, but restarting the MongoDB process on the server will temporarily fix it, however the process will usually pop up again randomly.

I used to run test instances of MongoDB on AWS t2.micros with no issues but recently am being plagued by the kswap0 issue after I tried to upgrade the base AMI to Ubuntu 16.04.

uname -a
Linux hostname 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message

Rasmus Larsen (rla-2) wrote on 2016-06-17:

#108

I had this issue too on AWS.

In my case, it was the udev rule for vm-hotadd and the fix as mentioned previously basically came down to "touch /etc/udev/rules.d/40-vm-hotadd.rules" which effectively disables the /lib/udev/rules.d/40-vm-hotadd.rules file (after a reboot).

The udev rule basically seems to only be active for Xen or Hyper-V and while it seems the Hyper-V stuff was also present in previous versions, the Xen stuff seems to be introduced in 15.10 or newer.

So if you're seeing this issue on anything running on Xen, including AWS, try:

touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot

This is probably a bug in Xen or a bug in the kernel.

Revision history for this message

Gregg King (greggking) wrote on 2016-06-19:

#109

Download full text (4.7 KiB)

Thank you so much Rasmus!

Your solution worked for me:
sudo touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot

I could always trigger the CPU usage bug with this:
stress --cpu 8 --io 4 --vm 7 --vm-bytes 128M --vm-hang 3 --timeout 60s

If running it once didn't work, a second run would do it. Now I've run it
over and over and kswap hits 1-2% during the stress test and drops back
down as soon as it ends.

Like Rasmus says, it's an AWS instance which runs on Xen.

On 17 June 2016 at 07:37, Rasmus Larsen <email address hidden> wrote:

> I had this issue too on AWS.
>
> In my case, it was the udev rule for vm-hotadd and the fix as mentioned
> previously basically came down to "touch /etc/udev/rules.d/40-vm-
> hotadd.rules" which effectively disables the /lib/udev/rules.d/40-vm-
> hotadd.rules file (after a reboot).
>
> The udev rule basically seems to only be active for Xen or Hyper-V and
> while it seems the Hyper-V stuff was also present in previous versions,
> the Xen stuff seems to be introduced in 15.10 or newer.
>
> So if you're seeing this issue on anything running on Xen, including
> AWS, try:
>
> touch /etc/udev/rules.d/40-vm-hotadd.rules
> reboot
>
> This is probably a bug in Xen or a bug in the kernel.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1518457
>
> Title:
> kswapd0 100% CPU usage
>
> Status in Linux:
> Unknown
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> As per bug 721896 and various others:
>
> I'm on an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory).
> Occasionally (about once a day), kswapd0 falls into a busy loop and
> spins on 100% CPU usage indefinitely. This can be provoked by
> copying/writing large files (e.g. dding a 256MB file), but it happens
> occasionally otherwise. System memory usage (not including
> buffers/caches) currently sits at 36%, which is typical[1]. Initially
> I had no swap space configured; I've since tried enabling a 256MB swap
> file, but the problem continues to occur and no swap space is used.
> The system can be recovered with `echo 1 > /proc/sys/vm/drop_caches`.
>
> Happy to provide further information/take further debugging actions.
>
>
> [1] Full output from `free`:
> total used free shared buffers cached
> Mem: 1014936 483448 531488 28556 9756 112700
> -/+ buffers/cache: 360992 653944
> Swap: 262140 0 262140
>
> ProblemType: Bug
> DistroRelease: Ubuntu 15.10
> Package: linux-image-4.2.0-18-generic 4.2.0-18.22
> ProcVersionSignature: Ubuntu 4.2.0-18.22-generic 4.2.3
> Uname: Linux 4.2.0-18-generic x86_64
> AlsaDevices:
> total 0
> crw-rw---- 1 root audio 116, 1 Nov 19 19:40 seq
> crw-rw---- 1 root audio 116, 33 Nov 19 19:40 timer
> AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
> ApportVersion: 2.19.1-0ubuntu5
> Architecture: amd64
> ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
> AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
> '/dev/snd/timer'] failed with ex...

Thank you so much Rasmus!

Your solution worked for me:
sudo touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot

I could always trigger the CPU usage bug with this:
stress --cpu 8 --io 4 --vm 7 --vm-bytes 128M --vm-hang 3 --timeout 60s

If running it once didn't work, a second run would do it. Now I've run it
over and over and kswap hits 1-2% during the stress test and drops back
down as soon as it ends.

Like Rasmus says, it's an AWS instance which runs on Xen.

On 17 June 2016 at 07:37, Rasmus Larsen <rla@systime.dk> wrote:

> I had this issue too on AWS.
>
> In my case, it was the udev rule for vm-hotadd and the fix as mentioned
> previously basically came down to "touch /etc/udev/rules.d/40-vm-
> hotadd.rules" which effectively disables the /lib/udev/rules.d/40-vm-
> hotadd.rules file (after a reboot).
>
> The udev rule basically seems to only be active for Xen or Hyper-V and
> while it seems the Hyper-V stuff was also present in previous versions,
> the Xen stuff seems to be introduced in 15.10 or newer.
>
> So if you're seeing this issue on anything running on Xen, including
> AWS, try:
>
> touch /etc/udev/rules.d/40-vm-hotadd.rules
> reboot
>
> This is probably a bug in Xen or a bug in the kernel.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1518457
>
> Title:
>   kswapd0 100% CPU usage
>
> Status in Linux:
>   Unknown
> Status in linux package in Ubuntu:
>   Confirmed
>
> Bug description:
>   As per bug 721896 and various others:
>
>   I'm on an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory).
>   Occasionally (about once a day), kswapd0 falls into a busy loop and
>   spins on 100% CPU usage indefinitely. This can be provoked by
>   copying/writing large files (e.g. dding a 256MB file), but it happens
>   occasionally otherwise. System memory usage (not including
>   buffers/caches) currently sits at 36%, which is typical[1]. Initially
>   I had no swap space configured; I've since tried enabling a 256MB swap
>   file, but the problem continues to occur and no swap space is used.
>   The system can be recovered with `echo 1 > /proc/sys/vm/drop_caches`.
>
>   Happy to provide further information/take further debugging actions.
>
>
>   [1] Full output from `free`:
>                total       used       free     shared    buffers     cached
>   Mem:       1014936     483448     531488      28556       9756     112700
>   -/+ buffers/cache:     360992     653944
>   Swap:       262140          0     262140
>
>   ProblemType: Bug
>   DistroRelease: Ubuntu 15.10
>   Package: linux-image-4.2.0-18-generic 4.2.0-18.22
>   ProcVersionSignature: Ubuntu 4.2.0-18.22-generic 4.2.3
>   Uname: Linux 4.2.0-18-generic x86_64
>   AlsaDevices:
>    total 0
>    crw-rw---- 1 root audio 116,  1 Nov 19 19:40 seq
>    crw-rw---- 1 root audio 116, 33 Nov 19 19:40 timer
>   AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
>   ApportVersion: 2.19.1-0ubuntu5
>   Architecture: amd64
>   ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
>   AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
> '/dev/snd/timer'] failed with exit code 1:
>   CRDA: N/A
>   Date: Fri Nov 20 20:44:30 2015
>   Ec2AMI: ami-1c552a76
>   Ec2AMIManifest: (unknown)
>   Ec2AvailabilityZone: us-east-1d
>   Ec2InstanceType: t2.micro
>   Ec2Kernel: unavailable
>   Ec2Ramdisk: unavailable
>   IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
>   Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to
> initialize libusb: -99
>   MachineType: Xen HVM domU
>   PciMultimedia:
>
>   ProcEnviron:
>    TERM=screen
>    PATH=(custom, no user)
>    LANG=en_US.UTF-8
>    SHELL=/bin/bash
>   ProcFB: 0 xen
>   ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-18-generic
> root=UUID=35bc01f4-4602-4823-976e-508edef899df ro console=tty1
> console=ttyS0 net.ifnames=0
>   RelatedPackageVersions:
>    linux-restricted-modules-4.2.0-18-generic N/A
>    linux-backports-modules-4.2.0-18-generic  N/A
>    linux-firmware                            N/A
>   RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
>   SourcePackage: linux
>   UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
>   UpgradeStatus: No upgrade log present (probably fresh install)
>   dmi.bios.date: 05/06/2015
>   dmi.bios.vendor: Xen
>   dmi.bios.version: 4.2.amazon
>   dmi.chassis.type: 1
>   dmi.chassis.vendor: Xen
>   dmi.modalias:
> dmi:bvnXen:bvr4.2.amazon:bd05/06/2015:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
>   dmi.product.name: HVM domU
>   dmi.product.version: 4.2.amazon
>   dmi.sys.vendor: Xen
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/1518457/+subscriptions
>

Revision history for this message

quinn (q-shanahan) wrote on 2016-06-23:

#110

Is there a fix for this that doesn't require a reboot / something I could add to an ec2 instance's user_data?

Revision history for this message

Argat (argat) wrote on 2016-06-29:

#111

The suggested workaround works for me also on AWS instances with 15.10:

sudo touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot

Been stable now for over a week.

Revision history for this message

Joern Heissler (joernheissler) wrote on 2016-07-03:

#112

Can this workaround please be added to the official EC2 images? EC2 cannot hot swap CPU or memory from what I know.
Having this workaround built in would mean not having to reboot every newly launched instance.

Revision history for this message

In Linux Kernel Bug Tracker #65201, mail+kernel-bugzilla (mail+kernel-bugzilla-linux-kernel-bugs) wrote on 2016-07-25:

#189

Same thing on Thinkpad X220 with 8 GB RAM running Ubuntu 14.04, with Ubuntu's Kernel 3.16.0-77-generic.

Swap is disabled.

kswapd0 runs on high CPU and the HD light is on all the time during this (no idea why).

After 20 (!) minutes the OOM killer manages to kill a process to resolve the situation.

Revision history for this message

Christopher Snowhill (kode54) wrote on 2016-08-03:

#113

Is this known to affect paravirtualized instances, or is it restricted to hvm? Can anyone tell me what conditions I need to create this in a fresh instance? I'll spin up a PV t2.nano and see if I can reproduce it there.

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-08-03:

#114

I've tried this on t1.micro (PV) and t2.micro (HVM) instances in eu-west-1. To reproduce, I used the following two commands:
sudo apt install docker.io
sudo docker run -p 80:8080 cptactionhank/atlassian-jira

The startup should work, but navigate to http://instanceaddress/ and choose "I'll set it up myself" and "Built In" database. 10-20 seconds after you click "Next", you should see the memory being exhausted and kswapd0 use half of the CPU time.

Test results:
ubuntu/images/ebs-ssd/ubuntu-xenial-16.04-amd64-server-20160721 (PV): kswapd0 OK
ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20160721 (HVM): kswapd0 high CPU usage

It's worth noting that the memory blocks varies between distro and PV/HVM:
Amazon Linux HVM: /sys/devices/system/memory/memory[0-7]
RHEL 7.2 HVM: /sys/devices/system/memory/memory[0-7]
Ubuntu 16.04 HVM: /sys/devices/system/memory/memory[0-8]
Ubuntu 16.04 PV: /sys/devices/system/memory/memory[0-4]

Why Ubuntu on HVM has an extra memory block is a mystery. It seems to be offline by default, but enabled by the udev hotadd rule. And EC2 doesn't support hotadd.

As Joern Heissler suggested, why not remove the hotadd rule from the official images as a workaround? Although the underlying problem probably is related to why the additional memory block is there at all.

Revision history for this message

Felix Bünemann (felix-buenemann) wrote on 2016-08-03:

#115

I have run into this issue when using the goofys s3 fuse filesystem (https://github.com/kahing/goofys) on a t2.small instances when copying large files (which causes many memory buffers to be allocated). I think anything that stresses the memory subsystem will be able to trigger it.

Revision history for this message

Robin Miller (robincello) wrote on 2016-08-03:

#116

We have been seeing this issue intermittently on a set of servers that were running Ubuntu 15.10 and then 16.04. After overriding that udev vm hotadd rule as suggested above a couple weeks ago, the issue has yet to return (not a conclusive result, but so far so good).

Revision history for this message

Robin Miller (robincello) wrote on 2016-08-03:

#117

To add - these servers are all built on the official Ubuntu Amazon EC2 AMIs of the 'ebs-ssd' variety.

Revision history for this message

Andrew Tappert (andrewtappert) wrote on 2016-08-04:

#118

The original description says "kswapd0 falls into a busy loop and spins on 100% CPU usage indefinitely". But I think, while the effect may be similar, the actual behavior is a bit different. I think what is happening is that kswapd is accessing pages of memory that are causing the hypervisor (rather than the kernel) to do extra work. If you look at the overall CPU utilization of the instance, you'll see high "st" (steal) time. This can also be provoked manually, for example by trying to read via /proc/kcore from the extra memory region that has been identified in discussion above (for example, try to do a full memory dump with the Volatility getkcore tool).

Revision history for this message

José Martínez (xosemp) wrote on 2016-08-05:

#119

fillmem.sh Edit (440 bytes, text/x-sh)

I wrote a tiny batch script to reliably reproduce the bug. It mounts a tmpfs filesystem and writes a file that fills 98% of the currently available memory.

You can also pass it a custom percentage, like: ./fillmem.sh 95

<95% is hit and miss on a newly launched instance. 98% (the default) has inmediately spun kswapd to 100% on all of my tests.

----------

@andrewtappert: steal time means the T2 instance has run out of CPU credits. They launch with just enough credits to burst the CPU to 100% for 30 minutes.

I'm always getting SYS time on kswapd, until I hit the T2 credits limit.

Revision history for this message

In Linux Kernel Bug Tracker #65201, n.sherlock (n.sherlock-linux-kernel-bugs) wrote on 2016-08-25:

#190

Same problem on Amazon's t2.nano instance (512MB of RAM). Seemed to be triggered by doing a bunch of file IO. This is a brand new install of Ubuntu 16.04. I have no swap enabled, and yet:

top - 06:42:57 up 1:58, 1 user, load average: 2.43, 2.66, 2.31
Tasks: 125 total, 3 running, 122 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.1 us, 6.9 sy, 0.0 ni, 0.0 id, 0.9 wa, 0.0 hi, 0.0 si, 90.1 st
KiB Mem : 498416 total, 348096 free, 49772 used, 100548 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 411900 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29 root 20 0 0 0 0 R 65.0 0.0 103:16.64 kswapd0
14343 root 20 0 0 0 0 R 2.9 0.0 0:00.82 python

Running "echo 1 > /proc/sys/vm/drop_caches" didn't fix the problem, but it did fix it immediately with "3".

Also, my /tmp isn't full at all (6.5GB / 85% left on root).

Revision history for this message

In Linux Kernel Bug Tracker #65201, n.sherlock (n.sherlock-linux-kernel-bugs) wrote on 2016-08-25:

#191

A workaround for machines running under Xen has been found over on Ubuntu's bug tracker, see comment #69:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457

The workaround is to disable hot-add of memory:

touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot

Revision history for this message

In Linux Kernel Bug Tracker #65201, dek94 (dek94-linux-kernel-bugs) wrote on 2016-08-30:

#192

I tried the same Ubuntu inspired "disable hot-add of memory" (and CPU) workaround under AWS EC2 HVM, Centos 7.x with mainline (elrepo) 4.4.15 kernel: no such luck, I still see this occasionally.

Revision history for this message

Andy Robertson (andyrobertson101) wrote on 2016-09-03:

#120

All of my t2.micro & nano instances are affected by this in AWS EC2 after upgrading to Ubuntu16.

doing "echo 1 > /proc/sys/vm/drop_caches" (also tried echo 3) works for a short period of time, but it comes back within a few minutes.

I moved a couple instances over to f1.micro on GCP / GCE (1 vCPU, 0.6 GB memory) and the problem seems to have gone away. I can't do this with all of my instances yet though so a fix in AWS would be nice.

Revision history for this message

Paul Csiki (paulcsiki) wrote on 2016-09-03:

#121

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/69

Seems to have resolved all this issues on AWS t2.micro instances.

Revision history for this message

Gregg King (greggking) wrote on 2016-09-03:

#122

Download full text (4.1 KiB)

Did you try this from Rasmussen up above?

sudo touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot

That fixed it for me on EC2

On Saturday, 3 September 2016, Andy Robertson <email address hidden>
wrote:

> All of my t2.micro & nano instances are affected by this in AWS EC2
> after upgrading to Ubuntu16.
>
> doing "echo 1 > /proc/sys/vm/drop_caches" (also tried echo 3) works for
> a short period of time, but it comes back within a few minutes.
>
> I moved a couple instances over to f1.micro on GCP / GCE (1 vCPU, 0.6 GB
> memory) and the problem seems to have gone away. I can't do this with
> all of my instances yet though so a fix in AWS would be nice.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1518457
>
> Title:
> kswapd0 100% CPU usage
>
> Status in Linux:
> Unknown
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> As per bug 721896 and various others:
>
> I'm on an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory).
> Occasionally (about once a day), kswapd0 falls into a busy loop and
> spins on 100% CPU usage indefinitely. This can be provoked by
> copying/writing large files (e.g. dding a 256MB file), but it happens
> occasionally otherwise. System memory usage (not including
> buffers/caches) currently sits at 36%, which is typical[1]. Initially
> I had no swap space configured; I've since tried enabling a 256MB swap
> file, but the problem continues to occur and no swap space is used.
> The system can be recovered with `echo 1 > /proc/sys/vm/drop_caches`.
>
> Happy to provide further information/take further debugging actions.
>
>
> [1] Full output from `free`:
> total used free shared buffers cached
> Mem: 1014936 483448 531488 28556 9756 112700
> -/+ buffers/cache: 360992 653944
> Swap: 262140 0 262140
>
> ProblemType: Bug
> DistroRelease: Ubuntu 15.10
> Package: linux-image-4.2.0-18-generic 4.2.0-18.22
> ProcVersionSignature: Ubuntu 4.2.0-18.22-generic 4.2.3
> Uname: Linux 4.2.0-18-generic x86_64
> AlsaDevices:
> total 0
> crw-rw---- 1 root audio 116, 1 Nov 19 19:40 seq
> crw-rw---- 1 root audio 116, 33 Nov 19 19:40 timer
> AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
> ApportVersion: 2.19.1-0ubuntu5
> Architecture: amd64
> ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
> AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
> '/dev/snd/timer'] failed with exit code 1:
> CRDA: N/A
> Date: Fri Nov 20 20:44:30 2015
> Ec2AMI: ami-1c552a76
> Ec2AMIManifest: (unknown)
> Ec2AvailabilityZone: us-east-1d
> Ec2InstanceType: t2.micro
> Ec2Kernel: unavailable
> Ec2Ramdisk: unavailable
> IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
> Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to
> initialize libusb: -99
> MachineType: Xen HVM domU
> PciMultimedia:
>
> ProcEnviron:
> TERM=screen
> PATH=(custom, no user)
> LANG=en_US.UTF-8
> SHELL=/bin/bash
...

Did you try this from Rasmussen up above?

sudo touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot

That fixed it for me on EC2

On Saturday, 3 September 2016, Andy Robertson <andyrobertson101@gmail.com>
wrote:

> All of my t2.micro & nano instances are affected by this in AWS EC2
> after upgrading to Ubuntu16.
>
> doing "echo 1 > /proc/sys/vm/drop_caches" (also tried echo 3) works for
> a short period of time, but it comes back within a few minutes.
>
> I moved a couple instances over to f1.micro on GCP / GCE (1 vCPU, 0.6 GB
> memory) and the problem seems to have gone away.  I can't do this with
> all of my instances yet though so a fix in AWS would be nice.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1518457
>
> Title:
>   kswapd0 100% CPU usage
>
> Status in Linux:
>   Unknown
> Status in linux package in Ubuntu:
>   Confirmed
>
> Bug description:
>   As per bug 721896 and various others:
>
>   I'm on an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory).
>   Occasionally (about once a day), kswapd0 falls into a busy loop and
>   spins on 100% CPU usage indefinitely. This can be provoked by
>   copying/writing large files (e.g. dding a 256MB file), but it happens
>   occasionally otherwise. System memory usage (not including
>   buffers/caches) currently sits at 36%, which is typical[1]. Initially
>   I had no swap space configured; I've since tried enabling a 256MB swap
>   file, but the problem continues to occur and no swap space is used.
>   The system can be recovered with `echo 1 > /proc/sys/vm/drop_caches`.
>
>   Happy to provide further information/take further debugging actions.
>
>
>   [1] Full output from `free`:
>                total       used       free     shared    buffers     cached
>   Mem:       1014936     483448     531488      28556       9756     112700
>   -/+ buffers/cache:     360992     653944
>   Swap:       262140          0     262140
>
>   ProblemType: Bug
>   DistroRelease: Ubuntu 15.10
>   Package: linux-image-4.2.0-18-generic 4.2.0-18.22
>   ProcVersionSignature: Ubuntu 4.2.0-18.22-generic 4.2.3
>   Uname: Linux 4.2.0-18-generic x86_64
>   AlsaDevices:
>    total 0
>    crw-rw---- 1 root audio 116,  1 Nov 19 19:40 seq
>    crw-rw---- 1 root audio 116, 33 Nov 19 19:40 timer
>   AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
>   ApportVersion: 2.19.1-0ubuntu5
>   Architecture: amd64
>   ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
>   AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
> '/dev/snd/timer'] failed with exit code 1:
>   CRDA: N/A
>   Date: Fri Nov 20 20:44:30 2015
>   Ec2AMI: ami-1c552a76
>   Ec2AMIManifest: (unknown)
>   Ec2AvailabilityZone: us-east-1d
>   Ec2InstanceType: t2.micro
>   Ec2Kernel: unavailable
>   Ec2Ramdisk: unavailable
>   IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
>   Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to
> initialize libusb: -99
>   MachineType: Xen HVM domU
>   PciMultimedia:
>
>   ProcEnviron:
>    TERM=screen
>    PATH=(custom, no user)
>    LANG=en_US.UTF-8
>    SHELL=/bin/bash
>   ProcFB: 0 xen
>   ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-18-generic
> root=UUID=35bc01f4-4602-4823-976e-508edef899df ro console=tty1
> console=ttyS0 net.ifnames=0
>   RelatedPackageVersions:
>    linux-restricted-modules-4.2.0-18-generic N/A
>    linux-backports-modules-4.2.0-18-generic  N/A
>    linux-firmware                            N/A
>   RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
>   SourcePackage: linux
>   UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
>   UpgradeStatus: No upgrade log present (probably fresh install)
>   dmi.bios.date: 05/06/2015
>   dmi.bios.vendor: Xen
>   dmi.bios.version: 4.2.amazon
>   dmi.chassis.type: 1
>   dmi.chassis.vendor: Xen
>   dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd05/
> 06/2015:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
>   dmi.product.name: HVM domU
>   dmi.product.version: 4.2.amazon
>   dmi.sys.vendor: Xen
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/1518457/+subscriptions
>

Revision history for this message

Paul Buonopane (zenexer) wrote on 2016-09-17:

#123

From man 7 udev:

The udev rules are read from the files located in the system rules directory /lib/udev/rules.d, the volatile runtime directory /run/udev/rules.d and the local administration directory /etc/udev/rules.d. All rules files are collectively sorted and processed in lexical order, regardless of the directories in which they live. However, files with identical filenames replace each other. Files in /etc have the highest priority, files in /run take precedence over files with the same name in /lib. This can be used to override a system-supplied rules file with a local file if needed; a symlink in /etc with the same name as a rules file in /lib, pointing to /dev/null, disables the rules file entirely. Rule files must have the extension .rules; other extensions are ignored.

As such, the best way to work around this is:

sudo ln -s /dev/null /etc/udev/rules.d/40-vm-hotadd.rules

Unlike deleting or modifying the original file, this will persist across upgrades without requiring manual conflict resolution.

Revision history for this message

Richard Trout (richard-trout) wrote on 2016-09-26:

#124

Thanks #123 (as easy as?) works for me as a workaround.

Revision history for this message

Poldi (poldi) wrote on 2016-09-29:

#125

I have the same issue on 16.04

Revision history for this message

Dan Streetman (ddstreet) wrote on 2016-10-01:

#126

The problem is a bit complex. The Xen hypervisor uses memory ballooning, to control how many memory pages the guest can use. The kernel enumerates its e820 memory at boot, and since it's only 1G in this case, it all gets placed into the DMA32 zone. Then later during boot when the Xen balloon driver is initialized, it dynamically adds the balloon memory. The kernel always places hot-added memory into the Normal zone however, so the system winds up with the balloon memory, and only the balloon memory, in the Normal zone. Since the balloon driver starts at, or very close to, its memory target, only a very small number of pages are made available, which results in a Normal memory zone that's tiny - only 9 managed pages on the instance I tested. You can read the /proc/zoneinfo file to find the number of managed pages in the Normal zone.

Then when the system encounters memory pressure (i.e. very little free memory left), it wakes up the kswapd daemon to start freeing memory. The kswapd daemon then tries to "balance" memory by "balancing" each zone - DMA, DMA32, and Normal zones. However, it's essentially impossible for it to free pages from the Normal zone, because there are so few pages that whenever one is freed, the next page allocation takes it (because pages are usually allocated from the Normal zone first), and kswapd winds up in a continuous cycle of trying to free pages from the Normal zone forever.

This is also why disabling the udev memory hotadd (see comment 69) works around the problem - it prevents the Xen balloon driver from adding/enabling any of the pages in the Normal zone, so kswapd never has to bother trying to balance it, and thus there's no problem.

This appears to be fixed by Mel Gorman's 34-commit patch series that changes kswapd memory balancing to "per node" instead of "per zone":
https://marc.info/?l=linux-mm&m=146797052519026

That's a rather large patchset to backport to the xenial kernel, but I'll give it a try.

The problem is a bit complex.  The Xen hypervisor uses memory ballooning, to control how many memory pages the guest can use.  The kernel enumerates its e820 memory at boot, and since it's only 1G in this case, it all gets placed into the DMA32 zone.  Then later during boot when the Xen balloon driver is initialized, it dynamically adds the balloon memory.  The kernel always places hot-added memory into the Normal zone however, so the system winds up with the balloon memory, and only the balloon memory, in the Normal zone.  Since the balloon driver starts at, or very close to, its memory target, only a very small number of pages are made available, which results in a Normal memory zone that's tiny - only 9 managed pages on the instance I tested.  You can read the /proc/zoneinfo file to find the number of managed pages in the Normal zone.

Then when the system encounters memory pressure (i.e. very little free memory left), it wakes up the kswapd daemon to start freeing memory.  The kswapd daemon then tries to "balance" memory by "balancing" each zone - DMA, DMA32, and Normal zones.  However, it's essentially impossible for it to free pages from the Normal zone, because there are so few pages that whenever one is freed, the next page allocation takes it (because pages are usually allocated from the Normal zone first), and kswapd winds up in a continuous cycle of trying to free pages from the Normal zone forever.

This is also why disabling the udev memory hotadd (see comment 69) works around the problem - it prevents the Xen balloon driver from adding/enabling any of the pages in the Normal zone, so kswapd never has to bother trying to balance it, and thus there's no problem.

This appears to be fixed by Mel Gorman's 34-commit patch series that changes kswapd memory balancing to "per node" instead of "per zone":
https://marc.info/?l=linux-mm&m=146797052519026

That's a rather large patchset to backport to the xenial kernel, but I'll give it a try.

Changed in linux (Ubuntu):
assignee:	nobody → Dan Streetman (ddstreet)

Revision history for this message

Dan Streetman (ddstreet) wrote on 2016-10-01:

#127

The patch series that fixes this is included in yakkety (if anyone reproduces this on a yakkety kernel, please let me know), so this only needs fixing in xenial.

Revision history for this message

In Linux Kernel Bug Tracker #65201, ddstreet (ddstreet-linux-kernel-bugs) wrote on 2016-10-01:

#193

I detailed why this bug happens here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126

this appears to be fixed by Mel Gorman's patch series to change memory reclaim from "per zone" to "per node":
https://marc.info/?l=linux-mm&m=146797052519026

So this bug should be fixed with the latest kernel.

Revision history for this message

Dan Streetman (ddstreet) wrote on 2016-10-02:

#128

On review of the patch series, it's simply too large and complex to backport for this situation; it makes, and depends on, a rather large amount of change to the mm subsystem, and there are easier and smaller ways to work around this bug in the xenial kernel.

Specifically, a comparison of the Xen balloon driver vs. the virtio balloon driver shows an important difference; while the Xen balloon driver hot-adds memory as soon as it initializes, the virtio driver does not hot-add memory; it only adjusts its size to adjust the amount of free memory. Most importantly, the Xen balloon driver initially hot-adds memory but does not make any (except a very small amount) available for system use.

I'm looking at the Xen balloon driver to see how it can be changed to fix this bug.

Revision history for this message

In Linux Kernel Bug Tracker #65201, mail+kernel-bugzilla (mail+kernel-bugzilla-linux-kernel-bugs) wrote on 2016-10-02:

#194

(In reply to Dan Streetman from comment #40)
> I detailed why this bug happens here:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126
>
> So this bug should be fixed with the latest kernel.

Can you clarify, the link you mention seems to talk mainly about Xen. Do you think the latest kernel will fix it also for non-Xen machines?

Revision history for this message

In Linux Kernel Bug Tracker #65201, ddstreet (ddstreet-linux-kernel-bugs) wrote on 2016-10-02:

#195

(In reply to mail+kernel-bugzilla from comment #41)
> (In reply to Dan Streetman from comment #40)
> > I detailed why this bug happens here:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126
> >
> > So this bug should be fixed with the latest kernel.
>
> Can you clarify, the link you mention seems to talk mainly about Xen. Do you
> think the latest kernel will fix it also for non-Xen machines?

what does your /proc/zoneinfo look like? do you have a system with (approx) <= 4g and Normal zone with few managed pages?

Revision history for this message

In Linux Kernel Bug Tracker #65201, mail+kernel-bugzilla (mail+kernel-bugzilla-linux-kernel-bugs) wrote on 2016-10-02:

#196

(In reply to Dan Streetman from comment #42)
> what does your /proc/zoneinfo look like? do you have a system with (approx)
> <= 4g and Normal zone with few managed pages?

My zoneinfo file right now looks like this: https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94

(I upgraded from 8 GB to 16 GB memory recently though, after I wrote comment #36.)

Revision history for this message

In Linux Kernel Bug Tracker #65201, ddstreet (ddstreet-linux-kernel-bugs) wrote on 2016-10-02:

#197

(In reply to mail+kernel-bugzilla from comment #43)
> (In reply to Dan Streetman from comment #42)
> > what does your /proc/zoneinfo look like? do you have a system with
> (approx)
> > <= 4g and Normal zone with few managed pages?
>
> My zoneinfo file right now looks like this:
> https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94
>
> (I upgraded from 8 GB to 16 GB memory recently though, after I wrote comment
> #36.)

That zoneinfo doesn't look like you're seeing the same problem, so if you are seeing consistent, sustained (not just transient) 100% cpu from kswapd, I think it's a different problem from what I described in comment 40.

Seth Forshee (sforshee) on 2016-10-12

Changed in linux (Ubuntu Xenial):
status:	New → Fix Committed
importance:	Undecided → High
assignee:	nobody → Dan Streetman (ddstreet)

Seth Forshee (sforshee) on 2016-10-12

Changed in linux (Ubuntu Yakkety):
status:	Confirmed → Fix Committed

Andy Whitcroft (apw) on 2016-10-13

Changed in linux (Ubuntu Yakkety):
status:	Fix Committed → Invalid

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-10-13:

#129

This bug was fixed in the package linux - 4.4.0-43.63

---------------
linux (4.4.0-43.63) xenial; urgency=low

[ Seth Forshee ]

* Release Tracking Bug
- LP: #1632375

  * kswapd0 100% CPU usage (LP: #1518457)
    - SAUCE: (no-up) If zone is so small that watermarks are the same, stop zone
      balance.

-- Seth Forshee <email address hidden> Tue, 11 Oct 2016 07:54:56 -0500

Changed in linux (Ubuntu Xenial):
status:	Fix Committed → Fix Released

Revision history for this message

In Linux Kernel Bug Tracker #65201, samkostka (samkostka-linux-kernel-bugs) wrote on 2016-10-13:

#198

I'm assuming by latest kernel you mean 4.8? If so I'm looking forward to Arch pushing it through testing :)

Revision history for this message

Seth Forshee (sforshee) wrote on 2016-10-18:

#130

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-yakkety

Revision history for this message

CaptSaltyJack (csjubuntu) wrote on 2016-10-18:

#131

Will we see this fix make it to 16.04 LTS?

Revision history for this message

Felix Bünemann (felix-buenemann) wrote on 2016-10-20:

#132

The fix is already released in 16.04, make sure you have updated to linux-image 4.4.0-43.63 or later.

Revision history for this message

PierreF (pierre-fersing) wrote on 2016-10-20:

#133

If the verification apply also on 16.04, it does fix the issue.

We had a server that triggered the bug at least once a day (I suspect unattended-upgrade run every morning to trigger it). Since the upgrade - 2 days and half ago - the server had no issue.

Revision history for this message

Øystein Gisnås (oystein-gisnas) wrote on 2016-10-20:

#134

I have verified that the bug is fixed on 4.8.0-26.28 (yakkety), 4.4.0-43.63 (xenial) and 4.4.0-45.66 (xenial). Doing the same on 4.4.0-42.62 (xenial) reproduced the bug. All tests done on EC2 t2.small.

tags:

added: verification-done-yakkety
removed: verification-needed-yakkety

Revision history for this message

Raniz (raniz-1) wrote on 2016-10-25:

#135

After upgrading to 4.4.0-45 from 4.4.0-21 the issue seems to have gone away.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-11-09:

#136

Download full text (3.4 KiB)

This bug was fixed in the package linux - 4.8.0-27.29

---------------
linux (4.8.0-27.29) yakkety; urgency=low

[ Seth Forshee ]

* Release Tracking Bug
- LP: #1635377

  * proc_keys_show crash when reading /proc/keys (LP: #1634496)
    - SAUCE: KEYS: ensure xbuf is large enough to fix buffer overflow in
      proc_keys_show (LP: #1634496)

  * Revert "If zone is so small that watermarks are the same, stop zone balance"
    in yakkety (LP: #1632894)
    - Revert "UBUNTU: SAUCE: (no-up) If zone is so small that watermarks are the
      same, stop zone balance."

* lts-yakkety 4.8 cannot mount lvm raid1 (LP: #1631298)
- SAUCE: (no-up) dm raid: fix compat_features validation

  * kswapd0 100% CPU usage (LP: #1518457)
    - SAUCE: (no-up) If zone is so small that watermarks are the same, stop zone
      balance.

  * [Trusty->Yakkety] powerpc/64: Fix incorrect return value from
    __copy_tofrom_user (LP: #1632462)
    - SAUCE: (no-up) powerpc/64: Fix incorrect return value from
      __copy_tofrom_user

  * Ubuntu 16.10: Oops panic in move_page_tables/page_remove_rmap after running
    memory_stress_ng. (LP: #1628976)
    - SAUCE: (no-up) powerpc/pseries: Fix stack corruption in htpe code

  * Paths not failed properly when unmapping virtual FC ports in VIOS (using
    ibmvfc) (LP: #1632116)
    - scsi: ibmvfc: Fix I/O hang when port is not mapped

  * [Ubuntu16.10]KV4.8: kernel livepatch config options are not set
    (LP: #1626983)
    - [Config] Enable live patching on powerpc/ppc64el

* CONFIG_AUFS_XATTR is not set (LP: #1557776)
- [Config] CONFIG_AUFS_XATTR=y

  * Yakkety update to 4.8.1 stable release (LP: #1632445)
    - arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
    - Using BUG_ON() as an assert() is _never_ acceptable
    - usb: misc: legousbtower: Fix NULL pointer deference
    - Staging: fbtft: Fix bug in fbtft-core
    - usb: usbip: vudc: fix left shift overflow
    - USB: serial: cp210x: Add ID for a Juniper console
    - Revert "usbtmc: convert to devm_kzalloc"
    - ALSA: hda - Adding one more ALC255 pin definition for headset problem
    - ALSA: hda - Fix headset mic detection problem for several Dell laptops
    - ALSA: hda - Add the top speaker pin config for HP Spectre x360
    - Linux 4.8.1

  * PSL data cache should be flushed before resetting CAPI adapter
    (LP: #1632049)
    - cxl: Flush PSL cache before resetting the adapter

* thunder nic: avoid link delays due to RX_PACKET_DIS (LP: #1630038)
- net: thunderx: Don't set RX_PACKET_DIS while initializing

  * crypto/vmx/p8_ghash memory corruption (LP: #1630970)
    - crypto: ghash-generic - move common definitions to a new header file
    - crypto: vmx - Fix memory corruption caused by p8_ghash
    - crypto: vmx - Ensure ghash-generic is enabled

  * arm64: SPCR console not autodetected (LP: #1630311)
    - of/serial: move earlycon early_param handling to serial
    - [Config] CONFIG_ACPI_SPCR_TABLE=y
    - ACPI: parse SPCR and enable matching console
    - ARM64: ACPI: enable ACPI_SPCR_TABLE
    - serial: pl011: add console matching function

* include/linux/security.h header syntax error with !CONFIG_SECURITYFS
...

This bug was fixed in the package linux - 4.8.0-27.29

---------------
linux (4.8.0-27.29) yakkety; urgency=low

[ Seth Forshee ]

* Release Tracking Bug
    - LP: #1635377

* proc_keys_show crash when reading /proc/keys (LP: #1634496)
    - SAUCE: KEYS: ensure xbuf is large enough to fix buffer overflow in
      proc_keys_show (LP: #1634496)

* Revert "If zone is so small that watermarks are the same, stop zone balance"
    in yakkety (LP: #1632894)
    - Revert "UBUNTU: SAUCE: (no-up) If zone is so small that watermarks are the
      same, stop zone balance."

* lts-yakkety 4.8 cannot mount lvm raid1 (LP: #1631298)
    - SAUCE: (no-up) dm raid: fix compat_features validation

* kswapd0 100% CPU usage (LP: #1518457)
    - SAUCE: (no-up) If zone is so small that watermarks are the same, stop zone
      balance.

* [Trusty->Yakkety] powerpc/64: Fix incorrect return value from
    __copy_tofrom_user (LP: #1632462)
    - SAUCE: (no-up) powerpc/64: Fix incorrect return value from
      __copy_tofrom_user

* Ubuntu 16.10: Oops panic in move_page_tables/page_remove_rmap after running
    memory_stress_ng. (LP: #1628976)
    - SAUCE: (no-up) powerpc/pseries: Fix stack corruption in htpe code

* Paths not failed properly when unmapping virtual FC ports in VIOS (using
    ibmvfc) (LP: #1632116)
    - scsi: ibmvfc: Fix I/O hang when port is not mapped

* [Ubuntu16.10]KV4.8: kernel livepatch config options are not set
    (LP: #1626983)
    - [Config] Enable live patching on powerpc/ppc64el

* CONFIG_AUFS_XATTR is not set (LP: #1557776)
    - [Config] CONFIG_AUFS_XATTR=y

* Yakkety update to 4.8.1 stable release (LP: #1632445)
    - arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
    - Using BUG_ON() as an assert() is _never_ acceptable
    - usb: misc: legousbtower: Fix NULL pointer deference
    - Staging: fbtft: Fix bug in fbtft-core
    - usb: usbip: vudc: fix left shift overflow
    - USB: serial: cp210x: Add ID for a Juniper console
    - Revert "usbtmc: convert to devm_kzalloc"
    - ALSA: hda - Adding one more ALC255 pin definition for headset problem
    - ALSA: hda - Fix headset mic detection problem for several Dell laptops
    - ALSA: hda - Add the top speaker pin config for HP Spectre x360
    - Linux 4.8.1

* PSL data cache should be flushed before resetting CAPI adapter
    (LP: #1632049)
    - cxl: Flush PSL cache before resetting the adapter

* thunder nic: avoid link delays due to RX_PACKET_DIS (LP: #1630038)
    - net: thunderx: Don't set RX_PACKET_DIS while initializing

* crypto/vmx/p8_ghash memory corruption (LP: #1630970)
    - crypto: ghash-generic - move common definitions to a new header file
    - crypto: vmx - Fix memory corruption caused by p8_ghash
    - crypto: vmx - Ensure ghash-generic is enabled

* arm64: SPCR console not autodetected (LP: #1630311)
    - of/serial: move earlycon early_param handling to serial
    - [Config] CONFIG_ACPI_SPCR_TABLE=y
    - ACPI: parse SPCR and enable matching console
    - ARM64: ACPI: enable ACPI_SPCR_TABLE
    - serial: pl011: add console matching function

* include/linux/security.h header syntax error with !CONFIG_SECURITYFS
    (LP: #1630990)
    - SAUCE: (no-up) include/linux/security.h -- fix syntax error with
      CONFIG_SECURITYFS=n

* sha1-powerpc returning wrong results (LP: #1629977)
    - crypto: sha1-powerpc - little-endian support

-- Seth Forshee <seth.forshee@canonical.com>  Thu, 20 Oct 2016 14:09:37 -0500

Changed in linux (Ubuntu Yakkety):
status:	Invalid → Fix Released
status:	Invalid → Fix Released

Revision history for this message

Luc Pi (oluc) wrote on 2016-11-14:

#138

> Dan Streetman (ddstreet) wrote on 2016-10-01: #127
>
> The patch series that fixes this is included in yakkety
> (if anyone reproduces this on a yakkety kernel, please let me know),

I can see it every now and then with Yakkety and linux 4.8.0-27.
Can you advice any action?

$ uname -a
Linux luc-MacBook 4.8.0-27-generic #29-Ubuntu SMP Thu Oct 20 21:01:44 UTC 2016 i686 i686 i686 GNU/Linux

$ lsb_release -a
Description: Ubuntu 16.10
Codename: yakkety

Revision history for this message

In Linux Kernel Bug Tracker #65201, jc (jc-linux-kernel-bugs) wrote on 2016-11-15:

#199

I am having the same issue on Fedora 24 with kernel 4.8.6. So I guess it has not been pushed there, or it does not fix anything.
It is a huge job stopper as I need to transfer many files between two USB disks.
Kwapd0 appears on top of processes after a while, and slowly degrades overall performance until I have to hard reboot the machine in the middle of some transfer.

Revision history for this message

In Linux Kernel Bug Tracker #65201, samkostka (samkostka-linux-kernel-bugs) wrote on 2016-11-15:

#200

My guess is Fedora didn't put the changes through or something, because 4.8 has DEFINITELY fixed it for me. I used to have to reboot about twice daily due to this, but ever since I upgraded to 4.8 it hasn't happened once.

Revision history for this message

In Linux Kernel Bug Tracker #65201, me (me-linux-kernel-bugs) wrote on 2016-11-20:

#201

I'm on openSUSE with 4.8.8 and still have this issue.

Revision history for this message

velis (jure-erznoznik-gmail) wrote on 2016-11-22:

#139

Following this thread I had the same issue, running stock 16.04 (xenial) with kernel 4.4.0-38.
I have upgraded the kernel to 4.8.10-040810 and the end result is the same but symptoms are a bit different:

note: 1GB of RAM, no swap at all

(old kernel)
with 50% of RAM in buffers / cache, kswapd0 took all the CPU it could (~75%). iostat was pretty much at 0% utilisation. server ground to a halt immediately when free ram expired without ever "eating into" the buffers / cache.

(new kernel)
again, with 50% of RAM in buffers / cache, kswapd0 no longer takes 100% CPU, but it still emerges to the top of the list in top as it still manages to take more then the other processes. For a difference, now ~75% cpu processes are *WA*iting. A second difference is that now, after free ram is consumed, for a while buffers / cache are also being reduced in favour of the RAM hungry app. However, even this goes only down to about 40 - 43% or RAM being used by buffers / cache.

Hopefully this is making at least some sense.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-12-06:

#140

This bug was fixed in the package linux - 4.8.0-30.32

---------------
linux (4.8.0-30.32) yakkety; urgency=low

* CVE-2016-8655 (LP: #1646318)
- packet: fix race condition in packet_set_ring

-- Brad Figg <email address hidden> Thu, 01 Dec 2016 08:02:53 -0800

Changed in linux (Ubuntu):
status:	Invalid → Fix Released

Revision history for this message

In Linux Kernel Bug Tracker #65201, 00cpxxx (00cpxxx-linux-kernel-bugs) wrote on 2016-12-09:

#202

I'm on Debian with 4.8.7 and still have this issue.

Revision history for this message

Dan Streetman (ddstreet) wrote on 2016-12-14:

#141

> I can see it every now and then with Yakkety and linux 4.8.0-27.
> Can you advice any action?

what do you mean by "every now and then"? you mean kswapd runs at 100% for a short time, occasionally? that's normal.

> kswapd0 no longer takes 100% CPU, but it still emerges to the top of the list in top as it still
> manages to take more then the other processes

this sounds like kswapd is just doing its job now. if you stop all your applications, does kswapd still take up cpu time indefinitely?

Revision history for this message

Tuomo Sipola (tuomosipola) wrote on 2016-12-29:

#142

Running Ubuntu 16.10 Yakkety. Old 4.4.0-45-generic kernel works nicely. All the new 4.8.0 kernels eventually go berzerk with kswapd0, just today the newest 4.8.0-32-generic. Normal usage, just a couple of terminals, Chromium, Nautilus windows and Evince PDF documents open.

Revision history for this message

Wilhelm Buchmueller (wilhelm-buchmueller) wrote on 2017-01-03:

#143

Running 4.8.13-100.fc23.i686+PAE, no desktop, swapon and swapoff, swappiness 0, 60, and 100.
kswapd0 usage high while reading from /dev/sda (not mounted, internal SSD with 500+MB/s read).
After stop reading, kswapd0 usage is gone.
No Problem when reading from USB-HDD.

Problem with high-speed reading?

Revision history for this message

Dan Streetman (ddstreet) wrote on 2017-01-03:

#144

This bug is fixed released already, any new problems should be opened in a new bug.

Revision history for this message

In Linux Kernel Bug Tracker #65201, Wilhelm.Buchmueller (wilhelm.buchmueller-linux-kernel-bugs) wrote on 2017-01-06:

#203

4.8.13-100.fc23.i686+PAE #1
/dev/sda is Samsung SSD 850 EVO 250GB

swapoff -va
sysctl vm.drop_caches=3

Problem, causes always heavy kswapd0 load:
  cat /dev/sda >> /dev/zero
  hdparm -t /dev/sda
  ddrescue /dev/sda /dev/zero -vf
  hexdump /dev/sda
  dd if=/dev/sda of=/dev/zero
  etc.

No problem (read speed ~500MB/s, except hdparm ):
  hdparm --direct -t /dev/sda
  dd iflag=direct if=/dev/sda of=/dev/zero bs=1073741824
  ddrescue --direct /dev/sda /dev/zero -vf -b 4096 -c 8192

Revision history for this message

In Linux Kernel Bug Tracker #65201, dclowes1 (dclowes1-linux-kernel-bugs) wrote on 2017-01-08:

#204

I am not sure if this is the same bug, but for me kswapd0 goes high-cpu following a page allocation failure in xhci_segment_alloc and I think that this has been occurring since moving to 4.8 on Fedora 24. I don't remember experiencing it before that. Currently on 4.8.15.

I normally boot with 3 or 4 USB 3.0 disks attached and, after the upgrade to 4.8.x noticed that kswapd0 was running at 100%. I went back to 4.7.x and no problem. Searches on this issue frequently referred to USB disks so I unplugged and rebooted.

If I unplug all of my USB 3.0 devices I get a normal boot, even with a USB weather station, keyboard, mouse. Sometimes, one or two USB 3.0 disks is OK too, If I boot with all of the USB 3.0 disks included, I get a kworker page allocation failure and after boot kswapd0 is high-cpu, usually split across 2-4 cores.

If I boot with two USB 3.0 disks and get a normal boot (no page allocation failure and normal kswapd) and then plug in a hub with the rest of the disks (and a USB 3.0 card reader) I get the page allocation failure at that point and kswapd0 goes high-cpu.

I have not looked at them all, but whenever I see kswapd0 high-cpu and I do look, there is the page allocation failure in the log.

The 'perf top' command seems to show different information from time to time but the top contenders are frequently 'shrink_inactive_list', 'inactive_list_is_low', 'find_next_bit', 'shrink_none_memcg', '_raw_spin_lock' to name a few.

Makes me wonder if the xhci allocation failure is the trigger, and fails to clean up on the error exit path, and kswapd0 is just a hapless victim. There is a stack trace (on ubuntu kernel) of the page allocation failure in the dmesg attached to https://bugzilla.redhat.com/show_bug.cgi?id=1395825 on this issue but I have more if it would help.

I have 19GiB free on a 24GiB machine so there should be no memory shortage to prompt swapping or the page allocation failure.

I had also noticed frequently that not all of my USB disks were mounted after boot and that I had to remove and reinsert a disk to use it. IIRC this affected my USB 2.0 disks too and from before the upgrade to 4.8 too.