Comment 0 for bug 1987029

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Issue found on J-5.15.0-47.51 with all ARM64 instances.

This issue came up after LTP test suite update (bug 1982995), it should not be considered as a regression since memcg_regression_test was not working at all before the update (bug 1949532)

In this case, the system will complain about this in the end of test case 1:
[ 5481.129771] UBSAN: array-index-out-of-bounds in /build/linux-jKRxmj/linux-5.15.0/kernel/sched/deadline.c:73:10
[ 5481.139769] index 256 is out of range for type 'long unsigned int [256]'
[ 5481.146467] CPU: 13 PID: 104657 Comm: memcg_regressio Not tainted 5.15.0-46-generic #49-Ubuntu
[ 5481.146472] Hardware name: Lenovo HR330A 7X33CTO1WW /FALCON , BIOS hve104r-1.15 02/26/2021
[ 5481.146474] Call trace:
[ 5481.146476] dump_backtrace+0x0/0x1ec
[ 5481.146481] show_stack+0x24/0x30
[ 5481.146483] dump_stack_lvl+0x68/0x84
[ 5481.146486] dump_stack+0x18/0x34
[ 5481.146489] ubsan_epilogue+0x10/0x54
[ 5481.146491] __ubsan_handle_out_of_bounds+0x80/0x90
[ 5481.146495] dl_task_can_attach+0x384/0x3c0
[ 5481.146499] task_can_attach+0xa0/0xcc
[ 5481.146502] cpuset_can_attach+0xb8/0x14c
[ 5481.146506] cgroup_migrate_execute+0x9c/0x4a0
[ 5481.146509] cgroup_migrate+0x94/0xb4
[ 5481.146512] cgroup_attach_task+0x120/0x1ec
[ 5481.146514] __cgroup_procs_write+0x10c/0x1b0
[ 5481.146517] cgroup_procs_write+0x28/0x40
[ 5481.146520] cgroup_file_write+0xb0/0x1f0
[ 5481.146523] kernfs_fop_write_iter+0x134/0x1cc
[ 5481.146527] new_sync_write+0xf0/0x18c
[ 5481.146531] vfs_write+0x230/0x2d0
[ 5481.146533] ksys_write+0x74/0x100
[ 5481.146536] __arm64_sys_write+0x28/0x3c
[ 5481.146538] invoke_syscall+0x78/0x100
[ 5481.146541] el0_svc_common.constprop.0+0x54/0x184
[ 5481.146544] do_el0_svc+0x34/0x9c
[ 5481.146547] el0_svc+0x48/0x1b0
[ 5481.146550] el0t_64_sync_handler+0xa4/0x130
[ 5481.146552] el0t_64_sync+0x1a4/0x1a8
[ 5481.146555] ================================================================================
[ 5481.154990] Unable to handle kernel paging request at virtual address ffff80000a17abb0
[ 5481.162903] Mem abort info:
[ 5481.165693] ESR = 0x96000007
[ 5481.168742] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5481.174052] SET = 0, FnV = 0
[ 5481.177101] EA = 0, S1PTW = 0
[ 5481.180237] FSC = 0x07: level 3 translation fault
[ 5481.185109] Data abort info:
[ 5481.187984] ISV = 0, ISS = 0x00000007
[ 5481.191814] CM = 0, WnR = 0
[ 5481.194770] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000bf1a994000
[ 5481.201465] [ffff80000a17abb0] pgd=100000bffffff003, p4d=100000bffffff003, pud=100000bfffffe003, pmd=100000bfffffa003, pte=0000000000000000
[ 5481.213982] Internal error: Oops: 96000007 [#1] SMP
[ 5481.218848] Modules linked in: nls_iso8859_1 acpi_ipmi joydev input_leds ipmi_ssif efi_pstore xgene_hwmon cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core uas hid_generic usbhid hid usb_storage dwc3 ast ulpi drm_vram_helper udc_core drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core crct10dif_ce fb_sys_fops cec ghash_ce rc_core sha2_ce sha256_arm64 mlxfw sha1_ce nvme psample igb drm nvme_core tls i2c_algo_bit i2c_xgene_slimpro ahci_platform gpio_dwapb xhci_plat_hcd aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 5481.296632] CPU: 13 PID: 104657 Comm: memcg_regressio Not tainted 5.15.0-46-generic #49-Ubuntu
[ 5481.305230] Hardware name: Lenovo HR330A 7X33CTO1WW /FALCON , BIOS hve104r-1.15 02/26/2021
[ 5481.315042] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 5481.321990] pc : dl_task_can_attach+0x70/0x3c0
[ 5481.326423] lr : dl_task_can_attach+0x384/0x3c0
[ 5481.330941] sp : ffff80004210b8d0
[ 5481.334242] x29: ffff80004210b8d0 x28: ffff000817e3ee40 x27: 0000000000000000
[ 5481.341366] x26: ffff80004210bae0 x25: 0000000000000000 x24: ffff000807041800
[ 5481.348489] x23: ffff80000a17a140 x22: 0000000000000100 x21: ffff80000a17a140
[ 5481.355613] x20: ffff80000a912818 x19: ffff80000a90dab0 x18: 0000000000000000
[ 5481.362736] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
[ 5481.369860] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: 3d3d3d3d3d3d3d3d
[ 5481.376983] x11: 3d3d3d3d3d3d3d3d x10: 3d3d3d3d3d3d3d3d x9 : ffff800008370c18
[ 5481.384106] x8 : 3d3d3d3d3d3d3d3d x7 : 0000000000000001 x6 : 0000000000000001
[ 5481.391229] x5 : 0000000000000000 x4 : ffff00bf5d705a88 x3 : 0000000000000000
[ 5481.398352] x2 : ffff000817e3ee40 x1 : ffff80000a90d000 x0 : ffff80000a17a140
[ 5481.405476] Call trace:
[ 5481.407909] dl_task_can_attach+0x70/0x3c0
[ 5481.411993] task_can_attach+0xa0/0xcc
[ 5481.415729] cpuset_can_attach+0xb8/0x14c
[ 5481.419727] cgroup_migrate_execute+0x9c/0x4a0
[ 5481.424158] cgroup_migrate+0x94/0xb4
[ 5481.427808] cgroup_attach_task+0x120/0x1ec
[ 5481.431978] __cgroup_procs_write+0x10c/0x1b0
[ 5481.436322] cgroup_procs_write+0x28/0x40
[ 5481.440320] cgroup_file_write+0xb0/0x1f0
[ 5481.444316] kernfs_fop_write_iter+0x134/0x1cc
[ 5481.448748] new_sync_write+0xf0/0x18c
[ 5481.452485] vfs_write+0x230/0x2d0
[ 5481.455874] ksys_write+0x74/0x100
[ 5481.459263] __arm64_sys_write+0x28/0x3c
[ 5481.463173] invoke_syscall+0x78/0x100
[ 5481.466910] el0_svc_common.constprop.0+0x54/0x184
[ 5481.471689] do_el0_svc+0x34/0x9c
[ 5481.474992] el0_svc+0x48/0x1b0
[ 5481.478122] el0t_64_sync_handler+0xa4/0x130
[ 5481.482379] el0t_64_sync+0x1a4/0x1a8
[ 5481.486030] Code: b0013734 91206294 f8767a80 8b170000 (f945381c)
[ 5481.492111] ---[ end trace 17955f4bab6956d4 ]---

Test output:
COMMAND: /opt/ltp/bin/ltp-pan -e -S -a 104516 -n 104516 -p -f /tmp/ltp-QG8KDCDEp8/alltests -l /opt/ltp/results/LTP_RUN_ON-2022_08_19-03h_35m_35s.log -C /opt/ltp/output/LTP_RUN_ON-2022_08_19-03h_35m_35s.failed -T /opt/ltp/output/LTP_RUN_ON-2022_08_19-03h_35m_35s.tconf
LOG File: /opt/ltp/results/LTP_RUN_ON-2022_08_19-03h_35m_35s.log
FAILED COMMAND File: /opt/ltp/output/LTP_RUN_ON-2022_08_19-03h_35m_35s.failed
TCONF COMMAND File: /opt/ltp/output/LTP_RUN_ON-2022_08_19-03h_35m_35s.tconf
Running tests.......
<<<test_start>>>
tag=memcg_regression stime=1660880135
cmdline="memcg_regression_test.sh"
contacts=""
analysis=exit
<<<test_output>>>
incrementing stop
memcg_regression_test 1 TINFO: timeout per run is 0h 5m 0s
memcg_regression_test 1 TINFO: test starts with cgroup version 2
memcg_regression_test 1 TPASS: no kernel bug was found
memcg_regression_test 2 TCONF: Cgroup v2 found, skipping test
Test timed out, sending SIGTERM!
If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
Test is still running... 10
Test is still running... 9
Test is still running... 8
Test is still running... 7
Test is still running... 6
Test is still running... 5
Test is still running... 4
Test is still running... 3
Test is still running... 2
Test is still running... 1
Test is still running, sending SIGKILL

I tried to bump LTP_TIMEOUT_MUL to 10, but it's still not working. System will stop responding at this point.

Please find attachment for the complete syslog output.