memcg_regression_test in ubuntu_ltp_controllers cause system hang on J-ARM64

Bug #1987029 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned

Bug Description

Issue found on J-5.15.0-47.51 with the following ARM64 instances:
  * howzit-kernel.arm64
  * kuzzle.arm64
  * helo-kernel.arm64 (with lowlatency 64k kernel)

The only exception for the moment is:
  * appleton-kernel (with lowlatency kernel)

This issue came up after LTP test suite update (bug 1982995), it should not be considered as a regression since memcg_regression_test was not working at all before the update (bug 1949532)

In this case, the system will complain about this in the end of test case 1:
[ 5481.129771] UBSAN: array-index-out-of-bounds in /build/linux-jKRxmj/linux-5.15.0/kernel/sched/deadline.c:73:10
[ 5481.139769] index 256 is out of range for type 'long unsigned int [256]'
[ 5481.146467] CPU: 13 PID: 104657 Comm: memcg_regressio Not tainted 5.15.0-46-generic #49-Ubuntu
[ 5481.146472] Hardware name: Lenovo HR330A 7X33CTO1WW /FALCON , BIOS hve104r-1.15 02/26/2021
[ 5481.146474] Call trace:
[ 5481.146476] dump_backtrace+0x0/0x1ec
[ 5481.146481] show_stack+0x24/0x30
[ 5481.146483] dump_stack_lvl+0x68/0x84
[ 5481.146486] dump_stack+0x18/0x34
[ 5481.146489] ubsan_epilogue+0x10/0x54
[ 5481.146491] __ubsan_handle_out_of_bounds+0x80/0x90
[ 5481.146495] dl_task_can_attach+0x384/0x3c0
[ 5481.146499] task_can_attach+0xa0/0xcc
[ 5481.146502] cpuset_can_attach+0xb8/0x14c
[ 5481.146506] cgroup_migrate_execute+0x9c/0x4a0
[ 5481.146509] cgroup_migrate+0x94/0xb4
[ 5481.146512] cgroup_attach_task+0x120/0x1ec
[ 5481.146514] __cgroup_procs_write+0x10c/0x1b0
[ 5481.146517] cgroup_procs_write+0x28/0x40
[ 5481.146520] cgroup_file_write+0xb0/0x1f0
[ 5481.146523] kernfs_fop_write_iter+0x134/0x1cc
[ 5481.146527] new_sync_write+0xf0/0x18c
[ 5481.146531] vfs_write+0x230/0x2d0
[ 5481.146533] ksys_write+0x74/0x100
[ 5481.146536] __arm64_sys_write+0x28/0x3c
[ 5481.146538] invoke_syscall+0x78/0x100
[ 5481.146541] el0_svc_common.constprop.0+0x54/0x184
[ 5481.146544] do_el0_svc+0x34/0x9c
[ 5481.146547] el0_svc+0x48/0x1b0
[ 5481.146550] el0t_64_sync_handler+0xa4/0x130
[ 5481.146552] el0t_64_sync+0x1a4/0x1a8
[ 5481.146555] ================================================================================
[ 5481.154990] Unable to handle kernel paging request at virtual address ffff80000a17abb0
[ 5481.162903] Mem abort info:
[ 5481.165693] ESR = 0x96000007
[ 5481.168742] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5481.174052] SET = 0, FnV = 0
[ 5481.177101] EA = 0, S1PTW = 0
[ 5481.180237] FSC = 0x07: level 3 translation fault
[ 5481.185109] Data abort info:
[ 5481.187984] ISV = 0, ISS = 0x00000007
[ 5481.191814] CM = 0, WnR = 0
[ 5481.194770] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000bf1a994000
[ 5481.201465] [ffff80000a17abb0] pgd=100000bffffff003, p4d=100000bffffff003, pud=100000bfffffe003, pmd=100000bfffffa003, pte=0000000000000000
[ 5481.213982] Internal error: Oops: 96000007 [#1] SMP
[ 5481.218848] Modules linked in: nls_iso8859_1 acpi_ipmi joydev input_leds ipmi_ssif efi_pstore xgene_hwmon cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core uas hid_generic usbhid hid usb_storage dwc3 ast ulpi drm_vram_helper udc_core drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core crct10dif_ce fb_sys_fops cec ghash_ce rc_core sha2_ce sha256_arm64 mlxfw sha1_ce nvme psample igb drm nvme_core tls i2c_algo_bit i2c_xgene_slimpro ahci_platform gpio_dwapb xhci_plat_hcd aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 5481.296632] CPU: 13 PID: 104657 Comm: memcg_regressio Not tainted 5.15.0-46-generic #49-Ubuntu
[ 5481.305230] Hardware name: Lenovo HR330A 7X33CTO1WW /FALCON , BIOS hve104r-1.15 02/26/2021
[ 5481.315042] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 5481.321990] pc : dl_task_can_attach+0x70/0x3c0
[ 5481.326423] lr : dl_task_can_attach+0x384/0x3c0
[ 5481.330941] sp : ffff80004210b8d0
[ 5481.334242] x29: ffff80004210b8d0 x28: ffff000817e3ee40 x27: 0000000000000000
[ 5481.341366] x26: ffff80004210bae0 x25: 0000000000000000 x24: ffff000807041800
[ 5481.348489] x23: ffff80000a17a140 x22: 0000000000000100 x21: ffff80000a17a140
[ 5481.355613] x20: ffff80000a912818 x19: ffff80000a90dab0 x18: 0000000000000000
[ 5481.362736] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
[ 5481.369860] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: 3d3d3d3d3d3d3d3d
[ 5481.376983] x11: 3d3d3d3d3d3d3d3d x10: 3d3d3d3d3d3d3d3d x9 : ffff800008370c18
[ 5481.384106] x8 : 3d3d3d3d3d3d3d3d x7 : 0000000000000001 x6 : 0000000000000001
[ 5481.391229] x5 : 0000000000000000 x4 : ffff00bf5d705a88 x3 : 0000000000000000
[ 5481.398352] x2 : ffff000817e3ee40 x1 : ffff80000a90d000 x0 : ffff80000a17a140
[ 5481.405476] Call trace:
[ 5481.407909] dl_task_can_attach+0x70/0x3c0
[ 5481.411993] task_can_attach+0xa0/0xcc
[ 5481.415729] cpuset_can_attach+0xb8/0x14c
[ 5481.419727] cgroup_migrate_execute+0x9c/0x4a0
[ 5481.424158] cgroup_migrate+0x94/0xb4
[ 5481.427808] cgroup_attach_task+0x120/0x1ec
[ 5481.431978] __cgroup_procs_write+0x10c/0x1b0
[ 5481.436322] cgroup_procs_write+0x28/0x40
[ 5481.440320] cgroup_file_write+0xb0/0x1f0
[ 5481.444316] kernfs_fop_write_iter+0x134/0x1cc
[ 5481.448748] new_sync_write+0xf0/0x18c
[ 5481.452485] vfs_write+0x230/0x2d0
[ 5481.455874] ksys_write+0x74/0x100
[ 5481.459263] __arm64_sys_write+0x28/0x3c
[ 5481.463173] invoke_syscall+0x78/0x100
[ 5481.466910] el0_svc_common.constprop.0+0x54/0x184
[ 5481.471689] do_el0_svc+0x34/0x9c
[ 5481.474992] el0_svc+0x48/0x1b0
[ 5481.478122] el0t_64_sync_handler+0xa4/0x130
[ 5481.482379] el0t_64_sync+0x1a4/0x1a8
[ 5481.486030] Code: b0013734 91206294 f8767a80 8b170000 (f945381c)
[ 5481.492111] ---[ end trace 17955f4bab6956d4 ]---

Test output:
COMMAND: /opt/ltp/bin/ltp-pan -e -S -a 104516 -n 104516 -p -f /tmp/ltp-QG8KDCDEp8/alltests -l /opt/ltp/results/LTP_RUN_ON-2022_08_19-03h_35m_35s.log -C /opt/ltp/output/LTP_RUN_ON-2022_08_19-03h_35m_35s.failed -T /opt/ltp/output/LTP_RUN_ON-2022_08_19-03h_35m_35s.tconf
LOG File: /opt/ltp/results/LTP_RUN_ON-2022_08_19-03h_35m_35s.log
FAILED COMMAND File: /opt/ltp/output/LTP_RUN_ON-2022_08_19-03h_35m_35s.failed
TCONF COMMAND File: /opt/ltp/output/LTP_RUN_ON-2022_08_19-03h_35m_35s.tconf
Running tests.......
<<<test_start>>>
tag=memcg_regression stime=1660880135
cmdline="memcg_regression_test.sh"
contacts=""
analysis=exit
<<<test_output>>>
incrementing stop
memcg_regression_test 1 TINFO: timeout per run is 0h 5m 0s
memcg_regression_test 1 TINFO: test starts with cgroup version 2
memcg_regression_test 1 TPASS: no kernel bug was found
memcg_regression_test 2 TCONF: Cgroup v2 found, skipping test
Test timed out, sending SIGTERM!
If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
Test is still running... 10
Test is still running... 9
Test is still running... 8
Test is still running... 7
Test is still running... 6
Test is still running... 5
Test is still running... 4
Test is still running... 3
Test is still running... 2
Test is still running... 1
Test is still running, sending SIGKILL

I tried to bump LTP_TIMEOUT_MUL to 10, but it's still not working. System will stop responding at this point.

Please find attachment for the complete syslog output.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
tags: added: 5.15 jammy sru-20220808 ubuntu-ltp-controllers
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.