mlx5_core: Error cqe on cqn

Bug #1887723 reported by Martin Matuska
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned
linux-oem-5.6 (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I have encountered the following repeating error with kernel 5.6.0-1018-oem. Network was disturbed and error kept repeating until for one hour until the system was hung.

316294.820469] mlx5_core 0000:44:00.1 enp68s0f1: Error cqe on cqn 0x816, ci 0xc5, sqn 0x1908, opcode 0xd, syndrome 0x4, vendor syndrome 0x51
[316294.833103] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[316294.833106] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[316294.833110] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[316294.833116] 00000030: 00 00 00 00 04 00 51 04 0e 00 19 08 53 64 dc d2
[316294.833118] WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x364, len: 128
[316294.833120] 00000000: 00 53 64 0e 00 19 08 07 00 00 00 08 00 00 00 00
[316294.833121] 00000010: 00 00 00 00 c0 00 05 a0 00 00 00 00 00 42 00 a3
[316294.833123] 00000020: 8e bf 47 d7 86 14 ad f8 ef 46 08 00 45 00 12 34
[316294.833124] 00000030: 76 d8 40 00 40 06 77 97 c3 a8 4a 4a 5f 67 cc fa
[316294.833126] 00000040: 01 bb d8 2a 5c 7e 3d a0 b0 c5 3e 74 80 18 00 0b
[316294.833127] 00000050: 4c 7b 00 00 01 01 08 0a 63 59 a1 46 00 41 05 b4
[316294.833129] 00000060: 00 00 12 00 00 08 01 01 00 00 00 00 c2 c6 0b 74
[316294.833130] 00000070: 00 00 00 44 00 08 01 01 00 00 00 00 c3 09 6c fc
[316294.833144] mlx5_core 0000:44:00.1 enp68s0f1: ERR CQE on SQ: 0x1908
[316294.996328] enp68s0f1: hw csum failure
[316295.000262] skb len=1500 headroom=78 headlen=1500 tailroom=22
[316295.000262] mac=(64,14) net=(78,40) trans=118
[316295.000262] shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
[316295.000262] csum(0x81a5 ip_summed=2 complete_sw=0 valid=0 level=0)
[316295.000262] hash(0x322a7dd7 sw=0 l4=1) proto=0x86dd pkttype=0 iif=0
[316295.029909] dev name=enp68s0f1 feat=0x0x0010a1821fd14ba9
...
[316295.943994] Hardware name: ASUSTeK COMPUTER INC. RS500A-E10-RS12U/KRPA-U16 Series, BIOS 0703 03/06/2020
[316295.943995] Call Trace:
[316295.943997] <IRQ>
[316295.944002] dump_stack+0x6d/0x9a
[316295.944006] netdev_rx_csum_fault.part.0+0x41/0x45
[316295.944007] __skb_gro_checksum_complete.cold+0xb/0x10
[316295.944009] tcp6_gro_receive+0xdc/0x1c0
[316295.944010] ipv6_gro_receive+0x1dc/0x460
[316295.944012] ? kmem_cache_alloc+0x16d/0x230
[316295.944017] dev_gro_receive+0x2fb/0x690
[316295.996284] ? mlx5e_build_rx_skb+0x38c/0xb60 [mlx5_core]
[316296.010778] napi_gro_receive+0x39/0x140
[316296.010793] mlx5e_handle_rx_cqe+0xa5/0x150 [mlx5_core]
[316296.010808] mlx5e_poll_rx_cq+0x7fe/0x910 [mlx5_core]
[316296.010825] mlx5e_napi_poll+0xda/0x610 [mlx5_core]
[316296.010843] ? mlx5_eq_comp_int+0x149/0x1b0 [mlx5_core]
[316296.010850] net_rx_action+0x13a/0x370
[316296.010859] __do_softirq+0xe1/0x2d6
[316296.010862] irq_exit+0xae/0xb0
[316296.010863] do_IRQ+0x5a/0xf0
[316296.010865] common_interrupt+0xf/0xf
[316296.010866] </IRQ>
[316296.010868] RIP: 0010:cpuidle_enter_state+0xca/0x3e0
[316296.010869] Code: ff e8 aa 7d 7e ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 2d 01 85 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 3f 02 00 00 49 63 d4 4c 8b 7d d0 4c 2b 7d c8 48 8d
[316296.010870] RSP: 0018:ffff9d84002cfe38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[316296.010872] RAX: ffff91110b62ce00 RBX: ffff9110ac1d1c00 RCX: 000000000000001f
[316296.010872] RDX: 0000000000000000 RSI: 00000000334bfb91 RDI: 0000000000000000
[316296.010873] RBP: ffff9d84002cfe78 R08: 00011fab2ae67109 R09: 00011faebfd6b300
[316296.010873] R10: ffff91110b62bac4 R11: ffff91110b62baa4 R12: 0000000000000002
[316296.010874] R13: ffffffff8f978700 R14: 0000000000000002 R15: ffff9110ac1d1c00
[316296.010876] ? cpuidle_enter_state+0xa6/0x3e0
[316296.010878] cpuidle_enter+0x2e/0x40
[316296.010880] call_cpuidle+0x23/0x40
[316296.010881] do_idle+0x1e7/0x280
[316296.010882] cpu_startup_entry+0x20/0x30
[316296.010885] start_secondary+0x167/0x1c0
[316296.010886] secondary_startup_64+0xa4/0xb0

# lspci -v -s 0000:44:00.1
44:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
 Subsystem: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
 Flags: bus master, fast devsel, latency 0, IRQ 254, NUMA node 0
 Memory at b0000000 (64-bit, prefetchable) [size=32M]
 Expansion ROM at b5300000 [disabled] [size=1M]
 Capabilities: [60] Express Endpoint, MSI 00
 Capabilities: [48] Vital Product Data
 Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
 Capabilities: [c0] Vendor Specific Information: Len=18 <?>
 Capabilities: [40] Power Management version 3
 Capabilities: [100] Advanced Error Reporting
 Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
 Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
 Capabilities: [230] Access Control Services
 Kernel driver in use: mlx5_core
 Kernel modules: mlx5_core

Tags: bot-comment
Revision history for this message
Martin Matuska (mm-vx) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1887723/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Martin Matuska (mm-vx)
affects: ubuntu → linux-oem-5.6 (Ubuntu)
Revision history for this message
Martin Matuska (mm-vx) wrote :

I can reproduce the bug on 5.4.0-40-generic

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1887723

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Martin Matuska (mm-vx) wrote :

I am unable to run the command, because the bug triggers a panic.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux-oem-5.6 (Ubuntu):
status: New → Confirmed
Revision history for this message
Martin Matuska (mm-vx) wrote :

I confirm that this bug can be avoided by setting the kernel option: iommu=pt

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Changed in linux-oem-5.6 (Ubuntu):
status: Confirmed → Incomplete
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux-oem-5.6 (Ubuntu) because there has been no activity for 60 days.]

Changed in linux-oem-5.6 (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.