rmaptest triggers warning on ARM64 node scobee-kernel with 5.15.0-1043-realtime

Bug #2030479 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux-realtime (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
New
Undecided
Unassigned

Bug Description

Issue found with 5.15.0-1043-realtime on ARM64 node scobee-kernel

When this happens the test will take about 50 minutes to run, which is way longer than the allocated 15 minutes.

[ 1576.641579] ------------[ cut here ]------------
[ 1576.641590] WARNING: CPU: 1 PID: 29 at kernel/sched/core.c:3109 set_task_cpu+0x168/0x214
[ 1576.641612] Modules linked in: binfmt_misc nls_iso8859_1 arm_spe_pmu ipmi_ssif hisi_sec2 acpi_ipmi hisi_zip hisi_hpre ecdh_generic libcurve25519_generic hisi_qm ipmi_si hns_roce_hw_v2 ecc authenc uacce ipmi_devintf ipmi_msghandler hisi_trng_v2 hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu hisi_uncore_pmu cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure mlx5_ib ib_uverbs ib_core hibmc_drm drm_vram_helper drm_ttm_helper ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops realtek crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce mlx5_core cec mlxfw hisi_sas_v3_hw rc_core hns3 psample hisi_sas_main tls hclge drm libsas xhci_pci hnae3 xhci_pci_renesas ahci
[ 1576.641818] scsi_transport_sas spi_dw_mmio gpio_dwapb spi_dw aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 1576.641841] CPU: 1 PID: 29 Comm: ksoftirqd/1 Not tainted 5.15.0-1043-realtime #48-Ubuntu
[ 1576.641848] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
[ 1576.641852] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1576.641858] pc : set_task_cpu+0x168/0x214
[ 1576.641866] lr : detach_tasks+0x138/0x4b0
[ 1576.641873] sp : ffff8000090aba20
[ 1576.641876] x29: ffff8000090aba20 x28: ffff202052e653c0 x27: ffffadf444f2a920
[ 1576.641885] x26: ffffadf4447994c0 x25: ffffadf4447994c0 x24: ffff203f7fb7ffb0
[ 1576.641893] x23: 0000000000000001 x22: ffff203f7fb7f4c0 x21: ffffadf444f27a18
[ 1576.641901] x20: 000000000000004a x19: ffff202052e653c0 x18: 0000000000000000
[ 1576.641909] x17: 0000000000000000 x16: ffffadf443558ad0 x15: 0000000000000000
[ 1576.641916] x14: ffffadf444f5a658 x13: ffffadf444f5a128 x12: 000000000000004a
[ 1576.641924] x11: 0000000000000004 x10: ffffadf444f27b50 x9 : ffffadf44294fdd8
[ 1576.641931] x8 : 000000000000004a x7 : fffffffffffffc00 x6 : 000000000000004a
[ 1576.641939] x5 : 00000000000005ee x4 : 0000000000000400 x3 : 000000000000b67e
[ 1576.641946] x2 : 0000000000000000 x1 : ffffadf443f569b8 x0 : 0000000000000001
[ 1576.641954] Call trace:
[ 1576.641956] set_task_cpu+0x168/0x214
[ 1576.641964] detach_tasks+0x138/0x4b0
[ 1576.641969] load_balance+0x260/0x834
[ 1576.641977] rebalance_domains+0x280/0x3f4
[ 1576.641983] _nohz_idle_balance.constprop.0.isra.0+0x1ec/0x34c
[ 1576.641990] run_rebalance_domains+0x84/0xb0
[ 1576.641997] __do_softirq+0x170/0x468
[ 1576.642003] run_ksoftirqd+0x80/0x150
[ 1576.642008] smpboot_thread_fn+0x260/0x2e4
[ 1576.642015] kthread+0x158/0x16c
[ 1576.642025] ret_from_fork+0x10/0x20
[ 1576.642033] ---[ end trace 0000000000000002 ]---

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue can be reproduced with 5.15.0-1040-realtime as well.

[ 417.611745] ------------[ cut here ]------------
[ 417.611762] WARNING: CPU: 0 PID: 15 at kernel/sched/core.c:3106 set_task_cpu+0x168/0x214
[ 417.611801] Modules linked in: binfmt_misc nls_iso8859_1 hisi_hpre arm_spe_pmu ipmi_ssif ecdh_generic hisi_zip libcurve25519_generic hisi_sec2 hns_roce_hw_v2 hisi_qm ecc authenc uacce acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler hisi_trng_v2 hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu hisi_uncore_pmu cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure mlx5_ib ib_uverbs ib_core hibmc_drm drm_vram_helper drm_ttm_helper ttm i2c_algo_bit drm_kms_helper mlx5_core syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core mlxfw realtek crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce hisi_sas_v3_hw psample hns3 hisi_sas_main tls hclge drm libsas xhci_pci hnae3 xhci_pci_renesas ahci
[ 417.612153] scsi_transport_sas spi_dw_mmio gpio_dwapb spi_dw aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 417.612201] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 5.15.0-1040-realtime #45-Ubuntu
[ 417.612216] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
[ 417.612224] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 417.612239] pc : set_task_cpu+0x168/0x214
[ 417.612254] lr : detach_tasks+0x138/0x454
[ 417.612268] sp : ffff80000860ba20
[ 417.612274] x29: ffff80000860ba20 x28: ffff20205bf0d140 x27: ffffcc0780c0a920
[ 417.612295] x26: ffffcc07804794c0 x25: ffffcc07804794c0 x24: ffff203f7fdf5fa8
[ 417.612313] x23: 0000000000000001 x22: ffff203f7fdf54c0 x21: ffffcc0780c07a18
[ 417.612330] x20: 000000000000005f x19: ffff20205bf0d140 x18: 000000000ecb1fd6
[ 417.612347] x17: 0000000006e190ba x16: ffffcc077e8d1eb0 x15: 0000000024632e0a
[ 417.612365] x14: ffffcc0780c3a2f8 x13: ffffcc0780c39dc8 x12: 000000000000005f
[ 417.612382] x11: 0000000000000004 x10: ffffcc0780c07b50 x9 : ffffcc077e62e7d8
[ 417.612399] x8 : 000000000000005f x7 : ffffffff80000000 x6 : 000000000000005f
[ 417.612414] x5 : 00000000000001bb x4 : 0000000000000001 x3 : 0000000000000000
[ 417.612431] x2 : 000000042188ec2c x1 : ffffcc077fc36578 x0 : 0000000000000001
[ 417.612448] Call trace:
[ 417.612454] set_task_cpu+0x168/0x214
[ 417.612470] detach_tasks+0x138/0x454
[ 417.612482] load_balance+0x260/0x834
[ 417.612497] rebalance_domains+0x280/0x3f4
[ 417.612512] _nohz_idle_balance.constprop.0.isra.0+0x1ec/0x34c
[ 417.612528] run_rebalance_domains+0x84/0xb0
[ 417.612543] __do_softirq+0x170/0x468
[ 417.612554] run_ksoftirqd+0x80/0x150
[ 417.612565] smpboot_thread_fn+0x260/0x2e4
[ 417.612578] kthread+0x158/0x16c
[ 417.612594] ret_from_fork+0x10/0x20
[ 417.612610] ---[ end trace 0000000000000002 ]---

Changed in linux-realtime (Ubuntu):
status: New → Invalid
Po-Hsu Lin (cypressyew)
description: updated
summary: - rmaptest trigger warning on ARM64 node scobee-kernel with
+ rmaptest triggers warning on ARM64 node scobee-kernel with
5.15.0-1043-realtime
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I think this issue is hardware specific, in the last cycle with 5.15.0-1040-realtime on node kopter-kernel test finished within allotted time without any problem.

With 5.15.0-1038-realtime, this test will still need 25 minutes to finish on scobee-kernel (no warning triggered by this test with this kernel))

Revision history for this message
Po-Hsu Lin (cypressyew) wrote (last edit ):
Download full text (3.1 KiB)

Bah it looks like this issue is quite random on this system... reproduced with 5.15.0-1036-realtime on the second attempt.

[ 698.346847] ------------[ cut here ]------------
[ 698.346856] WARNING: CPU: 0 PID: 15 at kernel/sched/core.c:3106 set_task_cpu+0x168/0x214
[ 698.346880] Modules linked in: binfmt_misc nls_iso8859_1 ipmi_ssif hisi_hpre arm_spe_pmu hns_roce_hw_v2 ecdh_generic hisi_sec2 hisi_zip libcurve25519_generic hisi_qm ecc uacce authenc acpi_ipmi ipmi_si hisi_trng_v2 ipmi_devintf hisi_uncore_hha_pmu ipmi_msghandler hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu hisi_uncore_pmu cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core hibmc_drm drm_vram_helper drm_ttm_helper ttm i2c_algo_bit drm_kms_helper mlx5_core syscopyarea sysfillrect sysimgblt fb_sys_fops cec ses enclosure realtek crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce mlxfw hisi_sas_v3_hw rc_core hns3 psample hisi_sas_main hclge tls libsas drm xhci_pci hnae3 xhci_pci_renesas ahci
[ 698.346947] scsi_transport_sas spi_dw_mmio gpio_dwapb spi_dw aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 698.346958] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 5.15.0-1036-realtime #39-Ubuntu
[ 698.346963] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B160.01 01/15/2020
[ 698.346967] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 698.346971] pc : set_task_cpu+0x168/0x214
[ 698.346973] lr : detach_tasks+0x138/0x390
[ 698.346978] sp : ffff80000860ba70
[ 698.346979] x29: ffff80000860ba70 x28: ffff0020e12abe80 x27: ffffaf19b897a918
[ 698.346982] x26: ffffaf19b83794c0 x25: ffffaf19b83794c0 x24: ffff203f7facbfa8
[ 698.346985] x23: 0000000000000001 x22: ffff203f7facb4c0 x21: ffffaf19b8977a18
[ 698.346988] x20: 0000000000000044 x19: ffff0020e12abe80 x18: 0000000000000000
[ 698.346990] x17: 0000000000000000 x16: ffffaf19b71a7860 x15: 0000000000000000
[ 698.346993] x14: 0000000000000000 x13: 0000000000000030 x12: ffffaf19b79e2b08
[ 698.346996] x11: ffffaf19b8977b50 x10: 0000000000000004 x9 : ffffaf19b6682338
[ 698.346999] x8 : 000b75952208d9a9 x7 : 0000000000e45932 x6 : 000000000000011f
[ 698.347001] x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 000000000000b67e
[ 698.347004] x2 : 0000000000000000 x1 : ffffaf19b7b4c258 x0 : 0000000000000001
[ 698.347008] Call trace:
[ 698.347010] set_task_cpu+0x168/0x214
[ 698.347013] detach_tasks+0x138/0x390
[ 698.347015] load_balance+0x228/0x6c0
[ 698.347018] rebalance_domains+0x264/0x390
[ 698.347021] _nohz_idle_balance.constprop.0.isra.0+0x1b0/0x284
[ 698.347024] run_rebalance_domains+0x6c/0x7c
[ 698.347026] __do_softirq+0x110/0x390
[ 698.347029] run_ksoftirqd+0x5c/0xdc
[ 698.347034] smpboot_thread_fn+0x2dc/0x324
[ 698.347041] kthread+0x154/0x160
[ 698.347050] ret_from_fork+0x10/0x20
[ 698.347058] ---[ end trace 0000000000000002 ]---

Test took about 13 minu...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.