ARM64 node appleton-kernel dmesg spammed with "mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9"

Bug #1958952 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

While investigating the SRU deployment failure, I noticed the dmesg will be spammed with:

Jan 25 07:48:36 appleton-kernel kernel: [ 22.885627] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9
Jan 25 07:48:36 appleton-kernel kernel: [ 22.885628] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1218): Completion event for bogus CQ 0x5a5aa9
Jan 25 07:48:36 appleton-kernel kernel: [ 22.885629] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9
Jan 25 07:48:36 appleton-kernel kernel: [ 22.885631] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9

Issue found with Focal 5.4.0-96-generic

Please find attachment for the syslog.

Not sure if this is cause of our deployment issue, but it seems odd to me.
And here is our deployment issue:
  1. System successfully deployed with Focal
  2. Deployment process hangs with "Enabling PPA" stage
  3. I cannot connect to this system manually, ssh hangs (soft lockup maybe?) after:
        Warning: Permanently added '10.229.50.13' (ECDSA) to the list of known hosts.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-96-generic 5.4.0-96.109
ProcVersionSignature: Ubuntu 5.4.0-96.109-generic 5.4.157
Uname: Linux 5.4.0-96-generic aarch64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jan 25 07:48 seq
 crw-rw---- 1 root audio 116, 33 Jan 25 07:48 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: arm64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
Date: Tue Jan 25 07:53:33 2022
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 001 Device 004: ID 12d1:0003 Huawei Technologies Co., Ltd.
 Bus 001 Device 003: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) USB 2.0 Hub
 Bus 001 Device 002: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) USB 2.0 Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Lsusb-t:
 /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/2p, 480M
     |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
     |__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
         |__ Port 1: Dev 4, If 1, Class=Human Interface Device, Driver=usbhid, 12M
         |__ Port 1: Dev 4, If 0, Class=Human Interface Device, Driver=usbhid, 12M
MachineType: Hisilicon D05
PciMultimedia:

ProcFB: 0 hibmcdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-96-generic root=UUID=3abb8e5a-2f46-4221-b664-cb02a273a249 ro sysrq_always_enabled
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-96-generic N/A
 linux-backports-modules-5.4.0-96-generic N/A
 linux-firmware 1.187.25
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/01/2018
dmi.bios.vendor: Huawei
dmi.bios.version: 1.50
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: BC11SPCD
dmi.board.vendor: Huawei
dmi.board.version: VER.A
dmi.chassis.asset.tag: To be filled by O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Hisilicon
dmi.chassis.version: To be filled by O.E.M.
dmi.modalias: dmi:bvnHuawei:bvr1.50:bd06/01/2018:svnHisilicon:pnD05:pvrV100R001C00:rvnHuawei:rnBC11SPCD:rvrVER.A:cvnHisilicon:ct17:cvrTobefilledbyO.E.M.:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: D05
dmi.product.sku: To be filled by O.E.M.
dmi.product.version: V100R001C00
dmi.sys.vendor: Hisilicon

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Po-Hsu Lin (cypressyew)
description: updated
Po-Hsu Lin (cypressyew)
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: ARM64 node dmesg spammed with "mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9"
Download full text (5.6 KiB)

I can see this issue with 5.4.0-124-generic #140~18.04.1-Ubuntu on node appleton-kernel as well.

After this, it's cpu soft lockup:
[ 19.296854] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9
[ 19.296855] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9
[ 19.296858] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9
[ 19.296860] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9
[ 19.347370] mlx5_core 0005:01:00.0 enP5p1s0f0: Link down
[ 19.634790] ixgbe 000a:11:00.0: registered PHC device on enP10p17s0f0
[ 21.492952] hns-nic HISI00C2:00 enahisic2i0: link up
[ 21.492971] IPv6: ADDRCONF(NETDEV_CHANGE): enahisic2i0: link becomes ready
[ 25.794327] EXT4-fs (nvme0n1p2): resizing filesystem from 390571008 to 390572113 blocks
[ 25.794567] EXT4-fs (nvme0n1p2): resized filesystem to 390572113
[ 27.550919] new mount options do not match the existing superblock, will be ignored
[ 32.692121] fbcon: Taking over console
[ 32.698403] Console: switching to colour frame buffer device 100x37
[ 64.276773] watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [swapper/16:0]
[ 64.283899] Modules linked in: nls_iso8859_1 ipmi_ssif input_leds joydev ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib hibmc_drm drm_vram_helper ses enclosure ttm hid_generic usbhid ib_uverbs hid ib_core marvell drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_ce mlx5_core hisi_sas_v2_hw ghash_ce sha2_ce sha256_arm64 ixgbe sha1_ce tls hisi_sas_main nvme xfrm_algo drm megaraid_sas nvme_core mdio mlxfw libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 64.283952] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-124-generic #140~18.04.1-Ubuntu
[ 64.283954] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018
[ 64.283956] pstate: 40400005 (nZcv daif +PAN -UAO)
[ 64.283962] pc : __do_softirq+0x98/0x350
[ 64.283966] lr : irq_exit+0xc0/0xc8
[ 64.283967] sp : ffff8000123b3ef0
[ 64.283969] x29: ffff8000123b3ef0 x28: ffff002fb7193d00
[ 64.283971] x27: 0000000000000000 x26: ffff8000123b4000
[ 64.283972] x25: ffff8000123b0000 x24: ffff001fba073600
[ 64.283974] x23: ffff8000127cbdb0 x22: 0000000000000000
[ 64.283976] x21: 0000000000000282 x20: 0000000000000002
[ 64.283977] x19: ffff800011b84000 x18: ffff800011268830
[ 64.283979] x17: 0000000000000000 x16: 0000000000000000
[ 64.283980] x15: 0000000000000001 x14: ffff002fbb9f21c8
[ 64.283982] x13: 0000000000000004 x12: 0000000000000003
[ 64.283984] x11: 0000000000000000 x10: 0000000000000040
[ 64.283985] x9 : ffff80001208f358 x8 : ffff80001208f350
[ 64.283987] x7 : ffff001fb9002270 x6 : 00000002a698ef5f
[ 64.283989]...

Read more...

tags: added: sru-20220808
Po-Hsu Lin (cypressyew)
summary: - ARM64 node dmesg spammed with "mlx5_core 0005:01:00.0:
+ ARM64 node appleton-kernel dmesg spammed with "mlx5_core 0005:01:00.0:
mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9"
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.