mm:cpuset01 from ubuntu_ltp flaky on scobee-kernel with J-realtime (warning found in dmesg)

Bug #2047694 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned

Bug Description

It seems we didn't run this test on scobee-kernel with J-realtime before, so it's a bit difficult to determine if this is caused by the recent LTP fork update [1].

Test failed with timeout:
INFO: Test start time: Thu Dec 21 09:12:45 UTC 2023
COMMAND: /opt/ltp/bin/ltp-pan -q -e -S -a 244473 -n 244473 -f /tmp/ltp-SeaoDkJ1R1/alltests -l /dev/null -C /dev/null -T /dev/null
LOG File: /dev/null
FAILED COMMAND File: /dev/null
TCONF COMMAND File: /dev/null
Running tests.......
tst_test.c:1690: TINFO: LTP version: 20230929-185-g19ef6521d
tst_test.c:1574: TINFO: Timeout per run is 0h 00m 30s
Test timeouted, sending SIGKILL!
tst_test.c:1622: TINFO: Killed the leftover descendant processes
tst_test.c:1628: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
tst_test.c:1630: TBROK: Test killed! (timeout?)

Summary:
passed 0
failed 0
broken 1
skipped 0
warnings 0
INFO: ltp-pan reported some tests FAIL
LTP Version: 20230929-185-g19ef6521d
INFO: Test end time: Thu Dec 21 09:13:15 UTC 2023

And it looks like this test will trigger a warning on this system even if the test has passed:
[ 165.551988] ------------[ cut here ]------------
[ 165.552018] WARNING: CPU: 0 PID: 15 at kernel/sched/core.c:3109 set_task_cpu+0x168/0x244
[ 165.552083] Modules linked in: binfmt_misc nls_iso8859_1 ipmi_ssif arm_spe_pmu acpi_ipmi hisi_zip ipmi_si hns_roce_hw_v2 hisi_sec2 hisi_hpre ecdh_generic libcurve25519_generic ipmi_devintf ecc hisi_qm ipmi_msghandler authenc uacce hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_trng_v2 hisi_uncore_pmu cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure mlx5_ib ib_uverbs ib_core hibmc_drm drm_vram_helper drm_ttm_helper ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect mlx5_core sysimgblt fb_sys_fops cec rc_core mlxfw realtek crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce hisi_sas_v3_hw hns3 psample hisi_sas_main hclge tls xhci_pci libsas drm hnae3 xhci_pci_renesas ahci scsi_transport_sas spi_dw_mmio spi_dw gpio_dwapb
[ 165.552555] aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 165.552595] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 5.15.0-1052-realtime #58-Ubuntu
[ 165.552614] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
[ 165.552624] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 165.552641] pc : set_task_cpu+0x168/0x244
[ 165.552660] lr : detach_tasks+0x138/0x4b0
[ 165.552684] sp : ffff80000860ba20
[ 165.552692] x29: ffff80000860ba20 x28: ffff2020075ac300 x27: ffffa9913540a928
[ 165.552718] x26: ffffa99134c794c0 x25: ffffa99134c794c0 x24: ffff003f7fbd9fb0
[ 165.552739] x23: 0000000000000001 x22: ffff003f7fbd94c0 x21: ffffa99135407a18
[ 165.552761] x20: 000000000000000d x19: ffff2020075ac300 x18: 0000000000000000
[ 165.552784] x17: ffff56ae4adda000 x16: ffffa99133a3e780 x15: 00003d094ed85380
[ 165.552805] x14: ffffa9913543a5a8 x13: ffffa9913543a078 x12: 000000000000000d
[ 165.552828] x11: 0000000000000004 x10: ffffa99135407b50 x9 : ffffa99132e30dd8
[ 165.552846] x8 : 000000000000000d x7 : ffffffffffffe000 x6 : 0000000000000314
[ 165.552866] x5 : 0000000000532ae2 x4 : 0000000000000001 x3 : 000000000000b67e
[ 165.552886] x2 : 0000000000000000 x1 : ffffa991344399b8 x0 : 0000000000000001
[ 165.552908] Call trace:
[ 165.552915] set_task_cpu+0x168/0x244
[ 165.552933] detach_tasks+0x138/0x4b0
[ 165.552948] load_balance+0x260/0x834
[ 165.552967] rebalance_domains+0x280/0x3f4
[ 165.552984] _nohz_idle_balance.constprop.0.isra.0+0x1ec/0x34c
[ 165.553004] run_rebalance_domains+0x84/0xb0
[ 165.553022] __do_softirq+0x170/0x468
[ 165.553035] run_ksoftirqd+0x80/0x150
[ 165.553052] smpboot_thread_fn+0x260/0x2e4
[ 165.553072] kthread+0x158/0x16c
[ 165.553092] ret_from_fork+0x10/0x20
[ 165.553117] ---[ end trace 0000000000000002 ]---

I tried to test this manually on scobee-kernel, but I found this is a bit flaky. In some attempts this test can finish with 10 seconds, but sometimes it will take up to 90 seconds.

Maybe bumping the timeout multiplier can be a possible solution.

[1] https://lists.ubuntu.com/archives/kernel-team/2023-December/147590.html

Po-Hsu Lin (cypressyew)
description: updated
summary: mm:cpuset01 from ubuntu_ltp flaky on scobee-kernel with J-realtime
+ (warning found in dmesg)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.