HW test stress-ng-cpu-long timeout, page allocation failure in stress-ng-numa

Bug #1908052 reported by Peter Sabaini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
Stress-ng
Won't Fix
Undecided
Unassigned

Bug Description

# Issue description

When running stress-ng-cpu-long and memtester the tests time out. Inspecting the running node I can see kernel traces from stress-ng-numa while memtester is still running. I believe the memtester time out could be a separate issue, I'll focus on stress-ng-cpu-long for this bug report

# Steps taken

Running stress-ng-cpu-long and memtester hardware testing scripts via the MAAS web ui

# Expected result

Stress-ng-cpu-long finish running and return a pass/fault verdict

# Actual result

Stress-ng-cpu-long times out

# Additional details

Node was booted and started running stress-ng-cpu-long Dec 10 05:50:36, stat on stress-ng-cpu-long gives last mtime as 2020-12-10 17:55:20

In kern.log I can see these faults:

Dec 10 18:27:11 direct-locust kernel: [45520.184146] stress-ng-numa: page allocation failure: order:0, mode:0x14600ca(GFP_HIGHUSER_MOVABLE|__GFP_THISNODE), nodemask=(null)
Dec 10 18:27:11 direct-locust kernel: [45520.184149] stress-ng-numa cpuset=/ mems_allowed=0-1
Dec 10 18:27:11 direct-locust kernel: [45520.184155] CPU: 60 PID: 79803 Comm: stress-ng-numa Not tainted 4.15.0-124-generic #127-Ubuntu
Dec 10 18:27:11 direct-locust kernel: [45520.184157] Hardware name: Supermicro SYS-2029TP-HC1R/X11DPT-PS, BIOS 2.1 08/23/2018
Dec 10 18:27:11 direct-locust kernel: [45520.184158] Call Trace:
Dec 10 18:27:11 direct-locust kernel: [45520.184167] dump_stack+0x6d/0x8e
Dec 10 18:27:11 direct-locust kernel: [45520.184174] warn_alloc+0xff/0x1a0
Dec 10 18:27:11 direct-locust kernel: [45520.184179] ? find_next_bit+0xb/0x10
Dec 10 18:27:11 direct-locust kernel: [45520.184182] __alloc_pages_slowpath+0xdc5/0xe00
Dec 10 18:27:11 direct-locust kernel: [45520.184186] __alloc_pages_nodemask+0x29a/0x2c0
Dec 10 18:27:11 direct-locust kernel: [45520.184192] new_node_page+0xb1/0xc0
Dec 10 18:27:11 direct-locust kernel: [45520.184196] migrate_pages+0x1f8/0xb80
Dec 10 18:27:11 direct-locust kernel: [45520.184198] ? policy_node+0x40/0x40
Dec 10 18:27:11 direct-locust kernel: [45520.184201] migrate_to_node+0xab/0xf0
Dec 10 18:27:11 direct-locust kernel: [45520.184205] do_migrate_pages.part.33+0x212/0x280
Dec 10 18:27:11 direct-locust kernel: [45520.184210] SYSC_migrate_pages+0x3f9/0x420
Dec 10 18:27:11 direct-locust kernel: [45520.184214] SyS_migrate_pages+0xe/0x10
Dec 10 18:27:11 direct-locust kernel: [45520.184216] ? SyS_migrate_pages+0xe/0x10
Dec 10 18:27:11 direct-locust kernel: [45520.184220] do_syscall_64+0x73/0x130
Dec 10 18:27:11 direct-locust kernel: [45520.184224] entry_SYSCALL_64_after_hwframe+0x41/0xa6
Dec 10 18:27:11 direct-locust kernel: [45520.184226] RIP: 0033:0x7f4747989639
Dec 10 18:27:11 direct-locust kernel: [45520.184228] RSP: 002b:00007ffd7dc7abb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000100
Dec 10 18:27:11 direct-locust kernel: [45520.184230] RAX: ffffffffffffffda RBX: 00007ffd7dc7cff0 RCX: 00007f4747989639
Dec 10 18:27:11 direct-locust kernel: [45520.184231] RDX: 00007ffd7dc7cbd0 RSI: 0000000000000400 RDI: 00000000000137bb
Dec 10 18:27:11 direct-locust kernel: [45520.184233] RBP: 00007ffd7dc7f0d0 R08: 0000000000000000 R09: 00007f474494f000
Dec 10 18:27:11 direct-locust kernel: [45520.184234] R10: 00007ffd7dc7cde0 R11: 0000000000000246 R12: 00007ffd7dc7abc0
Dec 10 18:27:11 direct-locust kernel: [45520.184235] R13: 0000000000001000 R14: 0000000000000400 R15: 00007ffd7dc7cde0
Dec 10 18:27:11 direct-locust kernel: [45520.184238] Mem-Info:
Dec 10 18:27:11 direct-locust kernel: [45520.184272] active_anon:350128 inactive_anon:129156 isolated_anon:2902
Dec 10 18:27:11 direct-locust kernel: [45520.184272] active_file:26312 inactive_file:65527 isolated_file:257
Dec 10 18:27:11 direct-locust kernel: [45520.184272] unevictable:129223410 dirty:0 writeback:0 unstable:0
Dec 10 18:27:11 direct-locust kernel: [45520.184272] slab_reclaimable:47013 slab_unreclaimable:123796
Dec 10 18:27:11 direct-locust kernel: [45520.184272] mapped:109157 shmem:206819 pagetables:272443 bounce:0
Dec 10 18:27:11 direct-locust kernel: [45520.184272] free:1045944 free_pcp:19528 free_cma:0
Dec 10 18:27:11 direct-locust kernel: [45520.184277] Node 0 active_anon:693468kB inactive_anon:217960kB active_file:168kB inactive_file:52kB unevictable:258589848kB isolated(anon):4544kB isolated(file):316kB mapped:205724kB dirty:0kB writeback:0kB shmem:338488kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Dec 10 18:27:11 direct-locust kernel: [45520.184281] Node 1 active_anon:707044kB inactive_anon:298664kB active_file:105080kB inactive_file:262056kB unevictable:258303792kB isolated(anon):7064kB isolated(file):712kB mapped:230904kB dirty:0kB writeback:0kB shmem:488788kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Dec 10 18:27:11 direct-locust kernel: [45520.184289] Node 0 DMA free:15884kB min:0kB low:12kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15972kB managed:15884kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 10 18:27:11 direct-locust kernel: [45520.184293] lowmem_reserve[]: 0 1567 256527 256527 256527
Dec 10 18:27:11 direct-locust kernel: [45520.184302] Node 0 DMA32 free:1019852kB min:272kB low:1876kB high:3480kB active_anon:9300kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:615100kB writepending:0kB present:1731492kB managed:1665924kB mlocked:615100kB kernel_stack:0kB pagetables:1196kB bounce:0kB free_pcp:11104kB local_pcp:372kB free_cma:0kB
Dec 10 18:27:11 direct-locust kernel: [45520.184307] lowmem_reserve[]: 0 0 254959 254959 254959
Dec 10 18:27:11 direct-locust kernel: [45520.184314] Node 0 Normal free:44548kB min:44644kB low:305720kB high:566796kB active_anon:684168kB inactive_anon:217960kB active_file:168kB inactive_file:48kB unevictable:257974748kB writepending:0kB present:265289728kB managed:261085024kB mlocked:257974756kB kernel_stack:15192kB pagetables:542032kB bounce:0kB free_pcp:21728kB local_pcp:72kB free_cma:0kB
Dec 10 18:27:11 direct-locust kernel: [45520.184319] lowmem_reserve[]: 0 0 0 0 0
Dec 10 18:27:11 direct-locust kernel: [45520.184328] Node 1 Normal free:3103492kB min:45184kB low:309412kB high:573640kB active_anon:707044kB inactive_anon:298664kB active_file:105080kB inactive_file:262056kB unevictable:258303792kB writepending:0kB present:268435456kB managed:264230744kB mlocked:258303800kB kernel_stack:15416kB pagetables:546544kB bounce:0kB free_pcp:45284kB local_pcp:704kB free_cma:0kB
Dec 10 18:27:11 direct-locust kernel: [45520.184333] lowmem_reserve[]: 0 0 0 0 0
Dec 10 18:27:11 direct-locust kernel: [45520.184335] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
Dec 10 18:27:11 direct-locust kernel: [45520.184345] Node 0 DMA32: 1*4kB (U) 1*8kB (U) 2*16kB (UM) 3*32kB (UME) 3*64kB (UME) 1*128kB (E) 2*256kB (ME) 2*512kB (UE) 2*1024kB (ME) 2*2048kB (ME) 247*4096kB (M) = 1019852kB
Dec 10 18:27:11 direct-locust kernel: [45520.184355] Node 0 Normal: 1017*4kB (UME) 646*8kB (UME) 176*16kB (UME) 248*32kB (UME) 84*64kB (UME) 25*128kB (UME) 4*256kB (UE) 2*512kB (E) 1*1024kB (U) 2*2048kB (ME) 2*4096kB (M) = 43924kB
Dec 10 18:27:11 direct-locust kernel: [45520.184366] Node 1 Normal: 1120*4kB (UME) 866*8kB (UME) 15213*16kB (UME) 236*32kB (UM) 192*64kB (UME) 67*128kB (M) 49*256kB (UM) 22*512kB (ME) 4*1024kB (ME) 0*2048kB 682*4096kB (M) = 3104608kB
Dec 10 18:27:11 direct-locust kernel: [45520.184377] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Dec 10 18:27:11 direct-locust kernel: [45520.184379] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 10 18:27:11 direct-locust kernel: [45520.184380] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Dec 10 18:27:11 direct-locust kernel: [45520.184381] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 10 18:27:11 direct-locust kernel: [45520.184382] 298915 total pagecache pages
Dec 10 18:27:11 direct-locust kernel: [45520.184384] 0 pages in swap cache
Dec 10 18:27:11 direct-locust kernel: [45520.184385] Swap cache stats: add 0, delete 0, find 0/0
Dec 10 18:27:11 direct-locust kernel: [45520.184386] Free swap = 0kB
Dec 10 18:27:11 direct-locust kernel: [45520.184386] Total swap = 0kB
Dec 10 18:27:11 direct-locust kernel: [45520.184388] 133868162 pages RAM
Dec 10 18:27:11 direct-locust kernel: [45520.184389] 0 pages HighMem/MovableOnly
Dec 10 18:27:11 direct-locust kernel: [45520.184389] 2118768 pages reserved
Dec 10 18:27:11 direct-locust kernel: [45520.184390] 0 pages cma reserved

Running ipmi-sel doesn't show any relevant errors

Versions:

Node is running bionic
kernel 4.15.0-124-generic #127-Ubuntu
stress-ng, version 0.09.25
MAAS 2.8.2 (8577-g.a3e674063-0ubuntu1~18.04.1)

Please let me know if I can provide any additional input

description: updated
Revision history for this message
Alberto Donato (ack) wrote :

This looks like a stress-ng issue.

The mentioned maas script runs stress-ng with the following command line:

sudo -n stress-ng --aggressive -a 0 --class cpu,cpu-cache --ignite-cpu \
    --log-brief --metrics-brief --times --tz --verify --timeout 12h

Changed in maas:
status: New → Invalid
Changed in stress-ng:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.