Comment 3 for bug 1958952

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: ARM64 node dmesg spammed with "mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9"

I can see this issue with 5.4.0-124-generic #140~18.04.1-Ubuntu on node appleton-kernel as well.

After this, it's cpu soft lockup:
[ 19.296854] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9
[ 19.296855] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9
[ 19.296858] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9
[ 19.296860] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9
[ 19.347370] mlx5_core 0005:01:00.0 enP5p1s0f0: Link down
[ 19.634790] ixgbe 000a:11:00.0: registered PHC device on enP10p17s0f0
[ 21.492952] hns-nic HISI00C2:00 enahisic2i0: link up
[ 21.492971] IPv6: ADDRCONF(NETDEV_CHANGE): enahisic2i0: link becomes ready
[ 25.794327] EXT4-fs (nvme0n1p2): resizing filesystem from 390571008 to 390572113 blocks
[ 25.794567] EXT4-fs (nvme0n1p2): resized filesystem to 390572113
[ 27.550919] new mount options do not match the existing superblock, will be ignored
[ 32.692121] fbcon: Taking over console
[ 32.698403] Console: switching to colour frame buffer device 100x37
[ 64.276773] watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [swapper/16:0]
[ 64.283899] Modules linked in: nls_iso8859_1 ipmi_ssif input_leds joydev ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib hibmc_drm drm_vram_helper ses enclosure ttm hid_generic usbhid ib_uverbs hid ib_core marvell drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_ce mlx5_core hisi_sas_v2_hw ghash_ce sha2_ce sha256_arm64 ixgbe sha1_ce tls hisi_sas_main nvme xfrm_algo drm megaraid_sas nvme_core mdio mlxfw libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 64.283952] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-124-generic #140~18.04.1-Ubuntu
[ 64.283954] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018
[ 64.283956] pstate: 40400005 (nZcv daif +PAN -UAO)
[ 64.283962] pc : __do_softirq+0x98/0x350
[ 64.283966] lr : irq_exit+0xc0/0xc8
[ 64.283967] sp : ffff8000123b3ef0
[ 64.283969] x29: ffff8000123b3ef0 x28: ffff002fb7193d00
[ 64.283971] x27: 0000000000000000 x26: ffff8000123b4000
[ 64.283972] x25: ffff8000123b0000 x24: ffff001fba073600
[ 64.283974] x23: ffff8000127cbdb0 x22: 0000000000000000
[ 64.283976] x21: 0000000000000282 x20: 0000000000000002
[ 64.283977] x19: ffff800011b84000 x18: ffff800011268830
[ 64.283979] x17: 0000000000000000 x16: 0000000000000000
[ 64.283980] x15: 0000000000000001 x14: ffff002fbb9f21c8
[ 64.283982] x13: 0000000000000004 x12: 0000000000000003
[ 64.283984] x11: 0000000000000000 x10: 0000000000000040
[ 64.283985] x9 : ffff80001208f358 x8 : ffff80001208f350
[ 64.283987] x7 : ffff001fb9002270 x6 : 00000002a698ef5f
[ 64.283989] x5 : 00000000ffff0031 x4 : ffff802fa9e81000
[ 64.283991] x3 : ffff800011b84780 x2 : ffff802fa9e81000
[ 64.283993] x1 : 00000000000000e0 x0 : ffff800011b84780
[ 64.283995] Call trace:
[ 64.283998] __do_softirq+0x98/0x350
[ 64.284000] irq_exit+0xc0/0xc8
[ 64.284003] __handle_domain_irq+0x6c/0xc0
[ 64.284005] gic_handle_irq+0x84/0x2c0
[ 64.284007] el1_irq+0x104/0x1c0
[ 64.284010] arch_cpu_idle+0x34/0x1c0
[ 64.284014] default_idle_call+0x24/0x60
[ 64.284016] do_idle+0x1d8/0x2b8
[ 64.284017] cpu_startup_entry+0x2c/0xb0
[ 64.284020] secondary_start_kernel+0x198/0x288
[ 98.196663] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 98.202575] rcu: 16-....: (3 GPs behind) idle=8fa/0/0x3 softirq=983/983 fqs=7488
[ 98.210133] (detected by 5, t=15002 jiffies, g=4709, q=3243)
[ 98.210134] Task dump for CPU 16:
[ 98.210137] swapper/16 R running task 0 0 1 0x0000002a
[ 98.210140] Call trace:
[ 98.210146] __switch_to+0xcc/0x210
[ 98.210149] 0x0
[ 119.928660] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 16-... } 15393 jiffies s: 229 root: 0x2/.
[ 119.939266] rcu: blocking rcu_node structures: l=1:16-31:0x1/.
[ 119.945099] Task dump for CPU 16:
[ 119.945102] swapper/16 R running task 0 0 1 0x0000002a
[ 119.945108] Call trace:
[ 119.945120] __switch_to+0xcc/0x210
[ 119.945127] 0x0
[ 242.808432] INFO: task ureadahead:1097 blocked for more than 120 seconds.
[ 242.815214] Tainted: G L 5.4.0-124-generic #140~18.04.1-Ubuntu
[ 242.822868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.830691] ureadahead D 0 1097 1 0x00000000
[ 242.830695] Call trace:
[ 242.830703] __switch_to+0xcc/0x210
[ 242.830710] __schedule+0x310/0x7a8
[ 242.830712] schedule+0x38/0xa8
[ 242.830714] schedule_timeout+0x228/0x388
[ 242.830716] wait_for_completion+0xf4/0x4b8
[ 242.830719] __wait_rcu_gp+0x170/0x1a8
[ 242.830722] synchronize_rcu+0x68/0x98
[ 242.830725] ring_buffer_read_prepare_sync+0xc/0x18
[ 242.830727] __tracing_open+0x200/0x368
[ 242.830729] tracing_open+0xa4/0xf0
[ 242.830733] do_dentry_open+0x1cc/0x3e0
[ 242.830735] vfs_open+0x38/0x48
[ 242.830738] path_openat+0x2ac/0x1368
[ 242.830740] do_filp_open+0x88/0x108
[ 242.830742] do_sys_open+0x1b4/0x2e8
[ 242.830743] __arm64_sys_openat+0x2c/0x38
[ 242.830746] el0_svc_common.constprop.3+0x80/0x1f8
[ 242.830748] el0_svc_handler+0x34/0xa0
[ 242.830750] el0_svc+0x10/0x180