Bad TCP/UDP checksum on e1000e with tx-checksumming on

Bug #1251464 reported by Benjamin Franz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-lts-trusty (Ubuntu)
Incomplete
Low
Unassigned

Bug Description

This machine is configured as a KVM virtual hosting machine running Ubuntu 12.04.3 LTS (3.5.0-41-generic) with multiple bridged ethernet ports. I found this issue on the eth1/br1 interface while configuring nagios3 plugins. I discovered that I could not connect to a mysql database server for checking by nagios3 although I could ping it from the same box. Further investigation showed I also could not traceroute to it or ssh to it from this machine but could from any other physical box we have.

Digging deeper I discovered that outbound TCP/UDP packets from this machine had incorrect checksums.

Example captured with tcpdump of an attempted ssh connection after 'ethtool -k br1 on' (the connection never completes):

14:47:44.923128 IP (tos 0x0, ttl 62, id 843, offset 0, flags [DF], proto: TCP (6), length: 60) 10.32.1.9.37409 > 10.96.0.10.ssh: S, cksum 0x15c1 (incorrect (-> 0x2252), 1028975122:1028975122(0) win 14600 <mss 1460,sackOK,timestamp 874244441 0,nop,wscale 7>

then after running 'ethtool -k br1 tx off' I got this capture (and the SSH connection completed this time):

14:50:31.061165 IP (tos 0x0, ttl 62, id 48708, offset 0, flags [DF], proto: TCP (6), length: 60) 10.32.1.9.37412 > 10.96.0.10.ssh: S, cksum 0x0435 (correct), 1835349468:1835349468(0) win 14600 <mss 1460,sackOK,timestamp 874285976 0,nop,wscale 7>

So disabling TX checksum offloading clearly worked around the problem.

Testing also showed that the problem persists into hosted Ubuntu virtual machines unless they also have tx-checksumming turned off individually. The problem did not appear to affect any of my hosted CentOS5 virtual machines (running kernel 2.6.18-348.16.1.el5) - only my Ubuntu virtual hosts (Ubuntu 9.10 running 2.6.31-23-server and Ubuntu 12.03.3 LTS running both 3.8.0-33-generic and 3.5.0-43-generic).

lsb_release -rd
Description: Ubuntu 12.04.3 LTS
Release: 12.04

Expected behavior: Correct TCP/UDP checksums when using TX checksum offloading.

What happened: Incorrect TCP/UDP checksum when using TX checksum offloading.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.5.0-41-generic 3.5.0-41.64~precise1
ProcVersionSignature: Ubuntu 3.5.0-41.64~precise1-generic 3.5.7.21
Uname: Linux 3.5.0-41-generic x86_64
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Oct 5 04:15 seq
 crw-rw---T 1 root audio 116, 33 Oct 5 04:15 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu17.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Date: Thu Nov 14 14:23:17 2013
HibernationDevice: RESUME=UUID=3d0cdb3e-8b12-4a48-88ca-0b446014db10
InstallationMedia: Ubuntu-Server 12.04.2 LTS "Precise Pangolin" - Release amd64 (20130214)
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Supermicro X7DB8
MarkForUpload: True
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.5.0-41-generic root=/dev/mapper/pbox9d0-pbox9root ro
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-41-generic N/A
 linux-backports-modules-3.5.0-41-generic N/A
 linux-firmware 1.79.6
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux-lts-quantal
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/13/2007
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: X7DB8
dmi.board.vendor: Supermicro
dmi.board.version: PCB Version
dmi.chassis.type: 1
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd08/13/2007:svnSupermicro:pnX7DB8:pvr0123456789:rvnSupermicro:rnX7DB8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:
dmi.product.name: X7DB8
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Revision history for this message
Benjamin Franz (snowhare) wrote :
Revision history for this message
Jian Wen (wenjianhn) wrote :
penalvch (penalvch)
tags: added: quantal raring
Revision history for this message
penalvch (penalvch) wrote :

Benjamin Franz, the Quantal enablement kernel is EoL as per https://wiki.ubuntu.com/Kernel/LTSEnablementStack . Could you please test the Trusty enablement kernel and advise to the results?

affects: linux-lts-quantal (Ubuntu) → linux-lts-trusty (Ubuntu)
Changed in linux-lts-trusty (Ubuntu):
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Peter Cordes (peter-cordes) wrote :

Apparently bad checksums in tcpdump/wireshark captures for outgoing traffic are normal, when hardware offload is enabled. The NIC only fills in the correct value after data has been copied to the NIC, which is after capture takes a snapshot of the buffer with whatever garbage was there. See the last point in http://docs.gz.ro/node/282.

Since disabling chksum offload actually fixed your problem, maybe your NIC really was filling in checksums incorrectly (or it wasn't really enabled, so neither the kernel nor the NIC were generating correct checksums).

Anyway, I think to properly debug this, you should have tried capturing packets from a different computer, so you can be sure you're seeing what went out over the wire with/without HW chksum enabled.

TL:DR: you probably had a real issue (since disabling offload got your ssh working), but your debugging / verification technique was flawed, and would show a problem even on working hardware.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.