[Ubuntu 16.10] - System crashes and gives out call traces when libhugetlbfs test suite is run.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Triaged
|
High
|
Canonical Kernel Team |
Bug Description
== Comment: #0 - Santhosh G <email address hidden> - 2016-09-27 01:55:00 ==
Issue:
Kernel unable to handle page request when heapshrink test case is run from libhugetlbfs suite.
Environment:
arch - ppc64le
ubuntu kvm guest
Host related Info:
Kernel:
-----------------
uname -a
Linux ltc-haba1 4.8.0-17-generic #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
Memory:
-------
oot@ltc-haba1:~# free -h
total used free shared buff/cache available
Mem: 255G 65G 187G 22M 1.9G 188G
Swap: 225G 0B 225G
Hugepages configured:
-------
root@ltc-haba1:~# cat /proc/meminfo | grep -i Huge
AnonHugePages: 81920 kB
ShmemHugePages: 0 kB
HugePages_Total: 4096
HugePages_Free: 3584
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
Guest Related Info:
-------
-------
Kernel:
-------
root@ubuntu:
Linux ubuntu 4.8.0-17-generic #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
Memory:
-------
root@ubuntu:
total used free shared buff/cache available
Mem: 8.0G 133M 7.7G 15M 132M 7.5G
Swap: 3.3G 0B 3.3G
Hugepages configured:
-------
root@ubuntu:
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 256
HugePages_Free: 256
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
Steps to reproduce:
1- Install a ubuntu kvm guest with hugepages memory Backing.
2 - git clone the latest libhugetlbfs from https:/
3 - configure huge[pages in guest and run make check.
xmon is configured in the system .
The system gets call traces and enters xmon console:
HUGETLB_VERBOSE=1 HUGETLB_
[ 281.735804] Faulting instruction address: 0xc00000000027b410
cpu 0x1: Vector: 300 (Data Access) at [c0000001fa8c3730]
pc: c00000000027b410: shrink_
lr: c00000000027b3f4: shrink_
sp: c0000001fa8c39b0
msr: 800000010280b033
dar: 4200000000328e38
dsisr: 42000000
current = 0xc0000001fa8adc00
paca = 0xc00000000fb80900 softe: 0 irq_happened: 0x01
pid = 50, comm = kswapd0
Linux version 4.8.0-17-generic (buildd@
enter ? for help
[c0000001fa8c3aa0] c00000000027bbdc shrink_
[c0000001fa8c3bc0] c00000000027bf0c shrink_
[c0000001fa8c3c80] c00000000027d500 kswapd+0x460/0x990
[c0000001fa8c3d80] c0000000000fd120 kthread+0x110/0x130
[c0000001fa8c3e30] c0000000000098f0 ret_from_
xmon logs:
1:mon> e
cpu 0x1: Vector: 300 (Data Access) at [c0000001fa8e7730]
pc: c00000000027b410: shrink_
lr: c00000000027b3f4: shrink_
sp: c0000001fa8e79b0
msr: 800000010280b033
dar: 42000000000c58d0
dsisr: 42000000
current = 0xc0000001fa8a0000
paca = 0xc00000000fb80900 softe: 0 irq_happened: 0x01
pid = 50, comm = kswapd0
Linux version 4.8.0-17-generic (buildd@
1:mon> r
R00 = c00000000027b3f4 R16 = c0000001fffcfe00
R01 = c0000001fa8e79b0 R17 = 000000000000010a
R02 = c0000000014e5e00 R18 = 42000000000cbdd0
R03 = 0000000000000001 R19 = c0000001fffc6300
R04 = 0000000000000005 R20 = c0000001fa8e79e0
R05 = 0000000000000000 R21 = c0000001fe144800
R06 = f0000000003bc9a0 R22 = 0000000000000001
R07 = 00000001fee30000 R23 = 0000000000000005
R08 = 000000000000002a R24 = 000000000000207d
R09 = 0000000000000000 R25 = 0000000000000100
R10 = c000000001034e86 R26 = 0000000000000200
R11 = 0000000000000000 R27 = c0000001fa8e79d0
R12 = 0000000000002200 R28 = c0000001fa8e7ca0
R13 = c00000000fb80900 R29 = 0000000000000040
R14 = f000000000380000 R30 = c0000001fe144800
R15 = f000000000380020 R31 = c0000001fa8e79f0
pc = c00000000027b410 shrink_
cfar= c0000000000b47a4 kvmppc_
lr = c00000000027b3f4 shrink_
msr = 800000010280b033 cr = 24022222
ctr = c0000000002ba900 xer = 0000000020000000 trap = 300
dar = 42000000000c58d0 dsisr = 42000000
1:mon> t
[c0000001fa8e7aa0] c00000000027bc70 shrink_
[c0000001fa8e7bc0] c00000000027bf0c shrink_
[c0000001fa8e7c80] c00000000027d500 kswapd+0x460/0x990
[c0000001fa8e7d80] c0000000000fd120 kthread+0x110/0x130
[c0000001fa8e7e30] c0000000000098f0 ret_from_
== Comment: #2 - Santhosh G <email address hidden> - 2016-09-27 04:28:02 ==
Something similar to this issue is observed when mm tests in ltp is run.
Call Traces Output:
oom01 0 TINFO [ 2577.866629] Unable to handle kernel paging request for data at address 0x42000000004311d0
[ 2577.866759] Faulting instruction address: 0xc00000000027b410
[ 2577.866846] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2577.866911] SMP NR_CPUS=2048 NUMA pSeries
[ 2577.866980] Modules linked in: vmx_crypto ip_tables x_tables autofs4 ibmvscsi crc32c_vpmsum
[ 2577.867152] CPU: 119 PID: 116856 Comm: oom01 Not tainted 4.8.0-17-generic #19-Ubuntu
[ 2577.867252] task: c000000db5d56000 task.stack: c00000031a898000
[ 2577.867334] NIP: c00000000027b410 LR: c00000000027b3f4 CTR: 0000000000000006
[ 2577.867433] REGS: c00000031a89b3e0 TRAP: 0300 Not tainted (4.8.0-17-generic)
[ 2577.867531] MSR: 800000010280b033 <SF,VEC,
[ 2577.867864] CFAR: c0000000000b477c DAR: 42000000004311d0 DSISR: 42000000 SOFTE: 0
GPR00: c00000000027b3f4 c00000031a89b660 c0000000014e5e00 0000000000000001
GPR04: 0000000000000005 0000000000000000 f000000000252960 0000000de7db0000
GPR08: 000000000000007d 0000000000000000 c000000001034e86 0000000000000000
GPR12: 0000000000002200 c00000000fbc2f00 f000000001ec8000 f000000001ec8020
GPR16: c000000defb93e00 0000000000000111 42000000004376d0 c000000defb8a300
GPR20: c00000031a89b690 c000000dee0a4800 0000000000000001 0000000000000005
GPR24: 0000000000023657 0000000000000100 0000000000000200 c00000031a89b680
GPR28: c00000031a89ba00 0000000000000040 c000000dee0a4800 c00000031a89b6a0
[ 2577.869185] NIP [c00000000027b410] shrink_
[ 2577.869268] LR [c00000000027b3f4] shrink_
[ 2577.869349] Call Trace:
[ 2577.869385] [c00000031a89b660] [c00000000027b3f4] shrink_
[ 2577.869518] [c00000031a89b750] [c00000000027bc70] shrink_
[ 2577.869633] [c00000031a89b870] [c00000000027bf0c] shrink_
[ 2577.869733] [c00000031a89b930] [c00000000027c308] do_try_
[ 2577.869849] [c00000031a89b9e0] [c00000000027c74c] try_to_
[ 2577.869963] [c00000031a89ba70] [c000000000264afc] __alloc_
[ 2577.870081] [c00000031a89bc30] [c0000000002e1758] alloc_pages_
[ 2577.870181] [c00000031a89bcc0] [c0000000002ac5d4] handle_
[ 2577.870299] [c00000031a89bd80] [c000000000b90d50] do_page_
[ 2577.870435] [c00000031a89be30] [c000000000008948] handle_
[ 2577.870532] Instruction dump:
[ 2577.870578] 4bffbc19 7cb100d0 7ee4bb78 7e639b78 4800dbf9 60000000 892d023c 2f890000
[ 2577.870716] 409e01a4 7c2004ac 39200000 38600001 <91329b00> 4bd99b85 60000000 7fe3fb78
[ 2577.870845] ---[ end trace b2b062e289b7708f ]---
[ 2577.873701]
== Comment: #3 - Chandan Kumar <email address hidden> - 2016-09-27 05:18:41 ==
== Comment: #13 - Laurent Dufour <email address hidden> - 2016-10-04 11:51:59 ==
== Comment: #14 - Laurent Dufour <email address hidden> - 2016-10-05 04:18:52 ==
== Comment: #15 - Laurent Dufour <email address hidden> - 2016-10-05 05:12:41 ==
== Comment: #17 - Luciano Chavez <email address hidden> - 2016-10-05 15:40:06 ==
== Comment: #22 - Richard M. Scheller <email address hidden> - 2016-10-06 22:21:26 ==
(In reply to comment #21)
> Patched ubuntu kernel packages based on 4.8.0-19.21 are available here:
> http://
>
> laurent@test1:~$ uname -v
> #21+bz146511 SMP Thu Oct 6 16:37:38 CEST 2016
>
> Please give a try.
I have run with this patched kernel on four guests on my Ubuntu 16.10 KVM host. Three of my guests are NOT backed by huge pages. The fourth guest is backed by huge pages. All four of these guests have PCI passthrough adapters.
All four of these guests crashed and rebooted within a few hours with out-of-memory errors, both with the standard Ubuntu 4.8.0-19 kernel and with this patched kernel.
There are five other guests on the same host system which do not have PCI passthrough adapters. None of these guests are reproducing the out-of-memory errors, despite running the same test suites.
tags: |
added: targetmilestone-inin1610 removed: targetmilestone-inin--- |
Changed in linux (Ubuntu): | |
assignee: | Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
status: | Incomplete → Triaged |
Default Comment by Bridge