Comment 3 for bug 2040258

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

From test history, a1.medium did fail with "#602/p ld_dw: xor semi-random 64 bit imms, test 5 FAIL" on -1110

With 10x manual tests by running "sudo ./test_verifier" command on a1.medium, this test_verifier test only failed once with "#602/p ld_dw: xor semi-random 64 bit imms, test 5 FAIL"

When this happens the following call trace can be found in dmesg:
[ 1200.472720] test_verifier: page allocation failure: order:10, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[ 1200.472731] CPU: 0 PID: 29256 Comm: test_verifier Not tainted 5.4.0-1113-aws #123~18.04.1-Ubuntu
[ 1200.472732] Hardware name: Amazon EC2 a1.medium/, BIOS 1.0 11/1/2018
[ 1200.472734] Call trace:
[ 1200.472740] dump_backtrace+0x0/0x208
[ 1200.472743] show_stack+0x24/0x30
[ 1200.472746] dump_stack+0xd4/0x10c
[ 1200.472749] warn_alloc+0x104/0x170
[ 1200.472751] __alloc_pages_slowpath+0xb54/0xbd0
[ 1200.472753] __alloc_pages_nodemask+0x2b8/0x340
[ 1200.472756] alloc_pages_current+0x88/0xe8
[ 1200.472758] kmalloc_order+0x28/0x90
[ 1200.472760] kmalloc_order_trace+0x3c/0xf8
[ 1200.472762] __kmalloc+0x240/0x290
[ 1200.472765] bpf_int_jit_compile+0x2f4/0x4c8
[ 1200.472768] bpf_prog_select_runtime+0xfc/0x150
[ 1200.472770] bpf_prog_load+0x618/0x810
[ 1200.472771] __do_sys_bpf+0xcbc/0x1950
[ 1200.472773] __arm64_sys_bpf+0x28/0x38
[ 1200.472775] el0_svc_common.constprop.3+0x80/0x1f8
[ 1200.472777] el0_svc_handler+0x34/0xb0
[ 1200.472779] el0_svc+0x10/0x180
[ 1200.472781] Mem-Info:
[ 1200.472786] active_anon:20287 inactive_anon:44 isolated_anon:0
                active_file:111844 inactive_file:197912 isolated_file:0
                unevictable:0 dirty:78 writeback:0 unstable:0
                slab_reclaimable:48180 slab_unreclaimable:10434
                mapped:22285 shmem:224 pagetables:467 bounce:0
                free:68082 free_pcp:667 free_cma:8118

Maybe it's because a1.medium is too small? (just 2G ram, vs. 64GB ram on another ARM64 node c6g.8xlarge which did not fail with this before)

I tried manual test with autotest executed locally on a1.medium and remote execution over ssh with autotest from our builder, I was unable to reproduce this issue.