test_verifier - 2 failures reported with 5.4 kernel

Bug #2040258 reported by Kuba Pawlak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned

Bug Description

  SRU Cycle: 2023.10.02
  Series: bionic
  Package: linux-aws-5.4
  Version: 5.4.0-1113.123~18.04.1
  Cloud: aws
  Instance: a1.medium
  Region: us-west-2
  Operation: sru

Failed:
        #602/p ld_dw: xor semi-random 64 bit imms, test 5 FAIL

        #777/p scale: scale test 1 FAIL

full log attached

Revision history for this message
Kuba Pawlak (kuba-t-pawlak) wrote :
Po-Hsu Lin (cypressyew)
tags: added: 5.4 aws ubuntu-bpf
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue does not exist with B-aws-5.4.0-1112.121~18.04.2 from s2023.09.04 on a1.medium

And it's 100% reproducible with 5.4.0-1113.123~18.04.1, we will need to check if this is a real regression.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

From test history, a1.medium did fail with "#602/p ld_dw: xor semi-random 64 bit imms, test 5 FAIL" on -1110

With 10x manual tests by running "sudo ./test_verifier" command on a1.medium, this test_verifier test only failed once with "#602/p ld_dw: xor semi-random 64 bit imms, test 5 FAIL"

When this happens the following call trace can be found in dmesg:
[ 1200.472720] test_verifier: page allocation failure: order:10, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[ 1200.472731] CPU: 0 PID: 29256 Comm: test_verifier Not tainted 5.4.0-1113-aws #123~18.04.1-Ubuntu
[ 1200.472732] Hardware name: Amazon EC2 a1.medium/, BIOS 1.0 11/1/2018
[ 1200.472734] Call trace:
[ 1200.472740] dump_backtrace+0x0/0x208
[ 1200.472743] show_stack+0x24/0x30
[ 1200.472746] dump_stack+0xd4/0x10c
[ 1200.472749] warn_alloc+0x104/0x170
[ 1200.472751] __alloc_pages_slowpath+0xb54/0xbd0
[ 1200.472753] __alloc_pages_nodemask+0x2b8/0x340
[ 1200.472756] alloc_pages_current+0x88/0xe8
[ 1200.472758] kmalloc_order+0x28/0x90
[ 1200.472760] kmalloc_order_trace+0x3c/0xf8
[ 1200.472762] __kmalloc+0x240/0x290
[ 1200.472765] bpf_int_jit_compile+0x2f4/0x4c8
[ 1200.472768] bpf_prog_select_runtime+0xfc/0x150
[ 1200.472770] bpf_prog_load+0x618/0x810
[ 1200.472771] __do_sys_bpf+0xcbc/0x1950
[ 1200.472773] __arm64_sys_bpf+0x28/0x38
[ 1200.472775] el0_svc_common.constprop.3+0x80/0x1f8
[ 1200.472777] el0_svc_handler+0x34/0xb0
[ 1200.472779] el0_svc+0x10/0x180
[ 1200.472781] Mem-Info:
[ 1200.472786] active_anon:20287 inactive_anon:44 isolated_anon:0
                active_file:111844 inactive_file:197912 isolated_file:0
                unevictable:0 dirty:78 writeback:0 unstable:0
                slab_reclaimable:48180 slab_unreclaimable:10434
                mapped:22285 shmem:224 pagetables:467 bounce:0
                free:68082 free_pcp:667 free_cma:8118

Maybe it's because a1.medium is too small? (just 2G ram, vs. 64GB ram on another ARM64 node c6g.8xlarge which did not fail with this before)

I tried manual test with autotest executed locally on a1.medium and remote execution over ssh with autotest from our builder, I was unable to reproduce this issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.