bond0: hw csum failure

Bug #1811209 reported by digrouz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-meta-hwe (Ubuntu)
New
Undecided
Unassigned

Bug Description

Hello,

I've upgraded my 18.04 to use the 4.18 hwe kernel, since then I'm seeing this kind of message:

[ 125.502876] bond0: hw csum failure
[ 125.503965] CPU: 36 PID: 0 Comm: swapper/36 Tainted: P O 4.18.0-13-generic #14~18.04.1-Ubuntu
[ 125.503966] Hardware name: Dell Inc. PowerEdge R840/08XR9M, BIOS 1.3.9 11/21/2018
[ 125.503967] Call Trace:
[ 125.503969] <IRQ>
[ 125.503974] dump_stack+0x63/0x85
[ 125.503978] netdev_rx_csum_fault+0x38/0x40
[ 125.503982] __skb_checksum_complete+0xbc/0xd0
[ 125.503986] tcp_v4_rcv+0x12f/0xae0
[ 125.503990] ip_local_deliver_finish+0x62/0x200
[ 125.503993] ip_local_deliver+0x6f/0xf0
[ 125.503996] ? ip_route_input_noref+0x28/0x40
[ 125.503999] ip_rcv_finish+0x126/0x420
[ 125.504002] ip_rcv+0x28f/0x360
[ 125.504007] __netif_receive_skb_core+0x48c/0xb70
[ 125.504010] ? __build_skb+0x2b/0xf0
[ 125.504013] ? tcp4_gro_receive+0x137/0x1a0
[ 125.504017] __netif_receive_skb+0x18/0x60
[ 125.504021] ? __netif_receive_skb+0x18/0x60
[ 125.504025] netif_receive_skb_internal+0x45/0xe0
[ 125.504029] napi_gro_receive+0xc5/0xf0
[ 125.504050] mlx5e_handle_rx_cqe+0x1a6/0x510 [mlx5_core]
[ 125.504070] mlx5e_poll_rx_cq+0xd3/0x990 [mlx5_core]
[ 125.504088] mlx5e_napi_poll+0x9b/0xc60 [mlx5_core]
[ 125.504093] net_rx_action+0x140/0x3a0
[ 125.504098] __do_softirq+0xe4/0x2bb
[ 125.504103] irq_exit+0xbc/0xd0
[ 125.504107] do_IRQ+0x8a/0xd0
[ 125.504110] common_interrupt+0xf/0xf
[ 125.504112] </IRQ>
[ 125.504116] RIP: 0010:cpuidle_enter_state+0xa5/0x2c0
[ 125.504116] Code: 8b 3d cf cc 9d 76 e8 0a 16 89 ff 48 89 c3 0f 1f 44 00 00 31 ff e8 6b 21 89 ff 45 84 ff 0f 85 c8 01 00 00 fb 66 0f 1f 44 00 00 <48> 2b 5d d0 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48
[ 125.504170] RSP: 0018:ffffb32098bc7e50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd7
[ 125.504173] RAX: ffff91fdefe62c40 RBX: 0000001d38785887 RCX: 000000000000001f
[ 125.504175] RDX: 0000001d38785887 RSI: 000000002ac44ebf RDI: 0000000000000000
[ 125.504177] RBP: ffffb32098bc7e90 R08: 0000000000000f33 R09: ffffd20140645240
[ 125.504178] R10: ffffb32098bc7e20 R11: 0000000000000f0c R12: 0000000000000003
[ 125.504180] R13: ffffd20140645240 R14: ffffffff8a384918 R15: 0000000000000000
[ 125.504185] cpuidle_enter+0x17/0x20
[ 125.504188] call_cpuidle+0x23/0x40
[ 125.504191] do_idle+0x204/0x280
[ 125.504195] cpu_startup_entry+0x73/0x80
[ 125.504199] start_secondary+0x1ab/0x200
[ 125.504202] secondary_startup_64+0xa5/0xb0

I seems to be related to a change in the 4.18 kernel that does not trust anymore the checksum response from the mlx5 driver. This bug seems to appears on other distro too: https://access.redhat.com/solutions/3425461

Is there a way to fix this without disabling the hardware checksum offloading ?

affects: snappy-hwe-snaps → linux-meta-hwe (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.