HW test stress-ng-cpu-long timeout, page allocation failure in stress-ng-numa
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Invalid
|
Undecided
|
Unassigned | ||
Stress-ng |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
# Issue description
When running stress-ng-cpu-long and memtester the tests time out. Inspecting the running node I can see kernel traces from stress-ng-numa while memtester is still running. I believe the memtester time out could be a separate issue, I'll focus on stress-ng-cpu-long for this bug report
# Steps taken
Running stress-ng-cpu-long and memtester hardware testing scripts via the MAAS web ui
# Expected result
Stress-ng-cpu-long finish running and return a pass/fault verdict
# Actual result
Stress-ng-cpu-long times out
# Additional details
Node was booted and started running stress-ng-cpu-long Dec 10 05:50:36, stat on stress-ng-cpu-long gives last mtime as 2020-12-10 17:55:20
In kern.log I can see these faults:
Dec 10 18:27:11 direct-locust kernel: [45520.184146] stress-ng-numa: page allocation failure: order:0, mode:0x14600ca(
Dec 10 18:27:11 direct-locust kernel: [45520.184149] stress-ng-numa cpuset=/ mems_allowed=0-1
Dec 10 18:27:11 direct-locust kernel: [45520.184155] CPU: 60 PID: 79803 Comm: stress-ng-numa Not tainted 4.15.0-124-generic #127-Ubuntu
Dec 10 18:27:11 direct-locust kernel: [45520.184157] Hardware name: Supermicro SYS-2029TP-
Dec 10 18:27:11 direct-locust kernel: [45520.184158] Call Trace:
Dec 10 18:27:11 direct-locust kernel: [45520.184167] dump_stack+
Dec 10 18:27:11 direct-locust kernel: [45520.184174] warn_alloc+
Dec 10 18:27:11 direct-locust kernel: [45520.184179] ? find_next_
Dec 10 18:27:11 direct-locust kernel: [45520.184182] __alloc_
Dec 10 18:27:11 direct-locust kernel: [45520.184186] __alloc_
Dec 10 18:27:11 direct-locust kernel: [45520.184192] new_node_
Dec 10 18:27:11 direct-locust kernel: [45520.184196] migrate_
Dec 10 18:27:11 direct-locust kernel: [45520.184198] ? policy_
Dec 10 18:27:11 direct-locust kernel: [45520.184201] migrate_
Dec 10 18:27:11 direct-locust kernel: [45520.184205] do_migrate_
Dec 10 18:27:11 direct-locust kernel: [45520.184210] SYSC_migrate_
Dec 10 18:27:11 direct-locust kernel: [45520.184214] SyS_migrate_
Dec 10 18:27:11 direct-locust kernel: [45520.184216] ? SyS_migrate_
Dec 10 18:27:11 direct-locust kernel: [45520.184220] do_syscall_
Dec 10 18:27:11 direct-locust kernel: [45520.184224] entry_SYSCALL_
Dec 10 18:27:11 direct-locust kernel: [45520.184226] RIP: 0033:0x7f4747989639
Dec 10 18:27:11 direct-locust kernel: [45520.184228] RSP: 002b:00007ffd7d
Dec 10 18:27:11 direct-locust kernel: [45520.184230] RAX: ffffffffffffffda RBX: 00007ffd7dc7cff0 RCX: 00007f4747989639
Dec 10 18:27:11 direct-locust kernel: [45520.184231] RDX: 00007ffd7dc7cbd0 RSI: 0000000000000400 RDI: 00000000000137bb
Dec 10 18:27:11 direct-locust kernel: [45520.184233] RBP: 00007ffd7dc7f0d0 R08: 0000000000000000 R09: 00007f474494f000
Dec 10 18:27:11 direct-locust kernel: [45520.184234] R10: 00007ffd7dc7cde0 R11: 0000000000000246 R12: 00007ffd7dc7abc0
Dec 10 18:27:11 direct-locust kernel: [45520.184235] R13: 0000000000001000 R14: 0000000000000400 R15: 00007ffd7dc7cde0
Dec 10 18:27:11 direct-locust kernel: [45520.184238] Mem-Info:
Dec 10 18:27:11 direct-locust kernel: [45520.184272] active_anon:350128 inactive_
Dec 10 18:27:11 direct-locust kernel: [45520.184272] active_file:26312 inactive_file:65527 isolated_file:257
Dec 10 18:27:11 direct-locust kernel: [45520.184272] unevictable:
Dec 10 18:27:11 direct-locust kernel: [45520.184272] slab_reclaimabl
Dec 10 18:27:11 direct-locust kernel: [45520.184272] mapped:109157 shmem:206819 pagetables:272443 bounce:0
Dec 10 18:27:11 direct-locust kernel: [45520.184272] free:1045944 free_pcp:19528 free_cma:0
Dec 10 18:27:11 direct-locust kernel: [45520.184277] Node 0 active_
Dec 10 18:27:11 direct-locust kernel: [45520.184281] Node 1 active_
Dec 10 18:27:11 direct-locust kernel: [45520.184289] Node 0 DMA free:15884kB min:0kB low:12kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15972kB managed:15884kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 10 18:27:11 direct-locust kernel: [45520.184293] lowmem_reserve[]: 0 1567 256527 256527 256527
Dec 10 18:27:11 direct-locust kernel: [45520.184302] Node 0 DMA32 free:1019852kB min:272kB low:1876kB high:3480kB active_anon:9300kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:
Dec 10 18:27:11 direct-locust kernel: [45520.184307] lowmem_reserve[]: 0 0 254959 254959 254959
Dec 10 18:27:11 direct-locust kernel: [45520.184314] Node 0 Normal free:44548kB min:44644kB low:305720kB high:566796kB active_
Dec 10 18:27:11 direct-locust kernel: [45520.184319] lowmem_reserve[]: 0 0 0 0 0
Dec 10 18:27:11 direct-locust kernel: [45520.184328] Node 1 Normal free:3103492kB min:45184kB low:309412kB high:573640kB active_
Dec 10 18:27:11 direct-locust kernel: [45520.184333] lowmem_reserve[]: 0 0 0 0 0
Dec 10 18:27:11 direct-locust kernel: [45520.184335] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
Dec 10 18:27:11 direct-locust kernel: [45520.184345] Node 0 DMA32: 1*4kB (U) 1*8kB (U) 2*16kB (UM) 3*32kB (UME) 3*64kB (UME) 1*128kB (E) 2*256kB (ME) 2*512kB (UE) 2*1024kB (ME) 2*2048kB (ME) 247*4096kB (M) = 1019852kB
Dec 10 18:27:11 direct-locust kernel: [45520.184355] Node 0 Normal: 1017*4kB (UME) 646*8kB (UME) 176*16kB (UME) 248*32kB (UME) 84*64kB (UME) 25*128kB (UME) 4*256kB (UE) 2*512kB (E) 1*1024kB (U) 2*2048kB (ME) 2*4096kB (M) = 43924kB
Dec 10 18:27:11 direct-locust kernel: [45520.184366] Node 1 Normal: 1120*4kB (UME) 866*8kB (UME) 15213*16kB (UME) 236*32kB (UM) 192*64kB (UME) 67*128kB (M) 49*256kB (UM) 22*512kB (ME) 4*1024kB (ME) 0*2048kB 682*4096kB (M) = 3104608kB
Dec 10 18:27:11 direct-locust kernel: [45520.184377] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
Dec 10 18:27:11 direct-locust kernel: [45520.184379] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
Dec 10 18:27:11 direct-locust kernel: [45520.184380] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
Dec 10 18:27:11 direct-locust kernel: [45520.184381] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
Dec 10 18:27:11 direct-locust kernel: [45520.184382] 298915 total pagecache pages
Dec 10 18:27:11 direct-locust kernel: [45520.184384] 0 pages in swap cache
Dec 10 18:27:11 direct-locust kernel: [45520.184385] Swap cache stats: add 0, delete 0, find 0/0
Dec 10 18:27:11 direct-locust kernel: [45520.184386] Free swap = 0kB
Dec 10 18:27:11 direct-locust kernel: [45520.184386] Total swap = 0kB
Dec 10 18:27:11 direct-locust kernel: [45520.184388] 133868162 pages RAM
Dec 10 18:27:11 direct-locust kernel: [45520.184389] 0 pages HighMem/MovableOnly
Dec 10 18:27:11 direct-locust kernel: [45520.184389] 2118768 pages reserved
Dec 10 18:27:11 direct-locust kernel: [45520.184390] 0 pages cma reserved
Running ipmi-sel doesn't show any relevant errors
Versions:
Node is running bionic
kernel 4.15.0-124-generic #127-Ubuntu
stress-ng, version 0.09.25
MAAS 2.8.2 (8577-g.
Please let me know if I can provide any additional input
description: | updated |
Changed in stress-ng: | |
status: | New → Won't Fix |
This looks like a stress-ng issue.
The mentioned maas script runs stress-ng with the following command line:
sudo -n stress-ng --aggressive -a 0 --class cpu,cpu-cache --ignite-cpu \
--log-brief --metrics-brief --times --tz --verify --timeout 12h