"hw csum failure" in encapsulated network topolgies

Bug #1409123 reported by Jay Vosburgh
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Jay Vosburgh
Precise
Fix Released
Undecided
Unassigned
Trusty
Fix Released
Medium
Jay Vosburgh
Utopic
Fix Released
Undecided
Unassigned
Vivid
Fix Released
Medium
Jay Vosburgh
linux-lts-trusty (Ubuntu)
Invalid
Undecided
Unassigned
Precise
Fix Released
Undecided
Unassigned
Trusty
Invalid
Undecided
Unassigned
Utopic
Invalid
Undecided
Unassigned
Vivid
Invalid
Undecided
Unassigned
linux-lts-utopic (Ubuntu)
Invalid
Undecided
Unassigned
Precise
Invalid
Undecided
Unassigned
Trusty
Fix Released
Undecided
Unassigned
Utopic
Invalid
Undecided
Unassigned
Vivid
Invalid
Undecided
Unassigned

Bug Description

Virtualized network topologies that utilize encapsulation (e.g., VXLAN) and bridging may experience kernel errors of the format:

[ 4297.761899] eth0: hw csum failure
[ 4297.765210] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE 3.18.0-rc4
-nn+ #22
[ 4297.765212] Hardware name: LENOVO 0829F3U/To be filled by O.E.M., BIOS 90KT15
AUS 07/21/2010
[ 4297.765216] 0000000000000000 ffff88013fc03ba8 ffffffff8172f026 0000000000000
001
[ 4297.765219] ffff88013870e000 ffff88013fc03bc8 ffffffff8162ba52 ffffffff8161c
1a0
[ 4297.765221] ffff8800afdf1000 ffff88013fc03c08 ffffffff8162325c ffff88013870e
000
[ 4297.765223] Call Trace:
[ 4297.765224] <IRQ> [<ffffffff8172f026>] dump_stack+0x46/0x58
[ 4297.765235] [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238] [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
[ 4297.765240] [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243] [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246] [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
[ 4297.765248] [<ffffffff81660224>] ? nf_hook_slow+0x74/0x130
[ 4297.765250] [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
[ 4297.765253] [<ffffffff81666d4c>] ip_local_deliver_finish+0xac/0x220
[ 4297.765255] [<ffffffff81667058>] ip_local_deliver+0x48/0x80
[ 4297.765257] [<ffffffff816669c1>] ip_rcv_finish+0x81/0x360
[ 4297.765259] [<ffffffff81667332>] ip_rcv+0x2a2/0x3f0
[ 4297.765261] [<ffffffff8162e932>] __netif_receive_skb_core+0x562/0x7a0
[ 4297.765263] [<ffffffff8162eb88>] __netif_receive_skb+0x18/0x60
[ 4297.765265] [<ffffffff8162f8f6>] process_backlog+0xa6/0x150

The backtrace may vary, stacks descending into conntrack have also been observed:

Call Trace:
 <IRQ> [<ffffffff8171a324>] dump_stack+0x45/0x56
 [<ffffffff8161bfba>] netdev_rx_csum_fault+0x3a/0x40
 [<ffffffff81614782>] __skb_checksum_complete_head+0x62/0x70
 [<ffffffff816147a1>] __skb_checksum_complete+0x11/0x20
 [<ffffffff816a3eac>] nf_ip_checksum+0xcc/0x100
 [<ffffffffa04df33b>] udp_error+0xdb/0x1f0 [nf_conntrack]
 [<ffffffffa04d926e>] nf_conntrack_in+0xee/0xb40 [nf_conntrack]
 [<ffffffffa0307653>] ? do_execute_actions+0x2e3/0xab0 [openvswitch]
 [<ffffffffa0307e4b>] ? ovs_execute_actions+0x2b/0x30 [openvswitch]
 [<ffffffff81654540>] ? inet_del_offload+0x40/0x40
 [<ffffffffa03b52e2>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
 [<ffffffff8164e0aa>] nf_iterate+0x9a/0xb0
 [<ffffffff81654540>] ? inet_del_offload+0x40/0x40
 [<ffffffff8164e134>] nf_hook_slow+0x74/0x130
 [<ffffffff81654540>] ? inet_del_offload+0x40/0x40
 [<ffffffff81654f68>] ip_rcv+0x2f8/0x3d0

The root cause of this is twofold:

First, the kernel handling of forwarded packets that have been encapsulated (e.g., from VXLAN) for devices that support CHECKSUM_COMPLETE checksum offload fails to update the running checksum when decapsulating the packet.

Second, for the enic device itself, the hardware is not correctly computing the checksum for some cases.

Both of these issues are patched in mainline:

commit 17e96834fd35997ca7cdfbf15413bcd5a36ad448
Author: Govindarajulu Varadarajan <email address hidden>
Date: Thu Dec 18 15:58:42 2014 +0530

enic: fix rx skb checksum

commit 2c26d34bbcc0b3f30385d5587aa232289e2eed8e
Author: Jay Vosburgh <email address hidden>
Date: Fri Dec 19 15:32:00 2014 -0800

net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding

===
break-fix: - 17e96834fd35997ca7cdfbf15413bcd5a36ad448
break-fix: - 2c26d34bbcc0b3f30385d5587aa232289e2eed8e

Jay Vosburgh (jvosburgh)
Changed in linux (Ubuntu):
assignee: nobody → Jay Vosburgh (jvosburgh)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1409123

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
tags: added: kernel-bug-fixed-upstream kernel-da-key
Andy Whitcroft (apw)
Changed in linux-lts-utopic (Ubuntu Vivid):
status: New → Invalid
Changed in linux-lts-utopic (Ubuntu Utopic):
status: New → Invalid
description: updated
tags: added: kernel-bug-break-fix
Changed in linux (Ubuntu Trusty):
assignee: nobody → Jay Vosburgh (jvosburgh)
importance: Undecided → Medium
Andy Whitcroft (apw)
Changed in linux (Ubuntu Trusty):
status: New → Confirmed
Changed in linux (Ubuntu Utopic):
status: New → Confirmed
Changed in linux-lts-utopic (Ubuntu Trusty):
status: New → Confirmed
Changed in linux (Ubuntu Vivid):
status: Triaged → Confirmed
Revision history for this message
Andy Whitcroft (apw) wrote :

These seems to be making their way to -stable as expected, expecting these to hit the relevant upstreams in time for the next SRU cycle.

Andy Whitcroft (apw)
Changed in linux-lts-trusty (Ubuntu):
status: New → Invalid
Changed in linux-lts-utopic (Ubuntu Precise):
status: New → Invalid
Changed in linux-lts-trusty (Ubuntu Trusty):
status: New → Invalid
Changed in linux-lts-trusty (Ubuntu Utopic):
status: New → Invalid
Andy Whitcroft (apw)
Changed in linux (Ubuntu Precise):
status: New → Confirmed
Changed in linux-lts-trusty (Ubuntu Precise):
status: New → Confirmed
Andy Whitcroft (apw)
Changed in linux (Ubuntu Vivid):
status: Confirmed → Fix Committed
Andy Whitcroft (apw)
Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Andy Whitcroft (apw)
Changed in linux (Ubuntu Utopic):
status: Confirmed → Fix Committed
Andy Whitcroft (apw)
Changed in linux (Ubuntu Trusty):
status: Confirmed → Fix Committed
Andy Whitcroft (apw)
Changed in linux-lts-trusty (Ubuntu Precise):
status: Confirmed → Fix Committed
Changed in linux-lts-utopic (Ubuntu Trusty):
status: Confirmed → Fix Committed
Andy Whitcroft (apw)
Changed in linux (Ubuntu Precise):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Changed in linux-lts-trusty (Ubuntu Precise):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Changed in linux-lts-utopic (Ubuntu Trusty):
status: Fix Committed → Fix Released
Andy Whitcroft (apw)
Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
tags: removed: kernel-bug-break-fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.