vmxnet3 LRO IPv6 performance issues (stalling TCP)

Bug #1605494 reported by Floris Mouwen
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Trusty
Fix Released
Undecided
Tim Gardner
Xenial
Fix Released
Undecided
Tim Gardner
Yakkety
Fix Released
Undecided
Unassigned

Bug Description

We have performance problems with TCP over IPv6 with SSH/SCP and HTTP traffic on Linux guest virtual machines running on VMware ESXi (6.0u2). We are running Ubuntu 12.04.5 LTS, 14.04.3 LTS and soon 16.04.1 LTS.
If we upload (large) files with these protocols the performance go's down to 200kb/s.
With IPv4, we have a much higher performance.

This is a known issue in the vmxnet3 driver that is fixed upstream on the 21th of April:
"This is due to check sum handling issue in the vmxnet3 driver. Fix is on
the kernel tree." http://comments.gmane.org/gmane.linux.network/401727

Kernel Patch: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f0d437809d23999cb25207cfbe80c61e5703fdc1

Another example of someone who has this problem:
https://lists.debian.org/debian-kernel/2016/03/msg00001.html
https://lists.debian.org/debian-kernel/2016/03/msg00002.html

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-61-generic 3.13.0-61.100
ProcVersionSignature: Ubuntu 3.13.0-61.100-generic 3.13.11-ckt22
Uname: Linux 3.13.0-61-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jul 12 14:04 seq
 crw-rw---- 1 root audio 116, 33 Jul 12 14:04 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.21
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Date: Fri Jul 22 08:05:09 2016
HibernationDevice: RESUME=UUID=4b881d72-9266-45cd-a12f-f21b55fe1ae7
InstallationDate: Installed on 2014-05-08 (805 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
MachineType: VMware, Inc. VMware Virtual Platform
PciMultimedia:

ProcFB: 0 svgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-61-generic root=/dev/mapper/system-root ro
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-61-generic N/A
 linux-backports-modules-3.13.0-61-generic N/A
 linux-firmware 1.127.22
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:

dmi.bios.date: 09/21/2015
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd09/21/2015:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

Revision history for this message
Floris Mouwen (florismouwen) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Yakkety):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Floris Mouwen (florismouwen) wrote :

Hi Stefan and Tim,

Is this patch also available for Ubuntu 12.04 LTS and 14.04 LTS?
The fix in -proposed works in Ubuntu 16.04.1.

Kind Regards,
Floris Mouwen

tags: added: verification-done-xenial
tags: removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (13.4 KiB)

This bug was fixed in the package linux - 4.4.0-36.55

---------------
linux (4.4.0-36.55) xenial; urgency=low

  [ Stefan Bader ]

  * Release Tracking Bug
    - LP: #1612305

  * I2C touchpad does not work on AMD platform (LP: #1612006)
    - SAUCE: pinctrl/amd: Remove the default de-bounce time

  * CVE-2016-5696
    - tcp: make challenge acks less predictable

linux (4.4.0-35.54) xenial; urgency=low

  [ Stefan Bader ]

  * Release Tracking Bug
    - LP: #1611215

  * [i915_bpo] Sync with v4.7 (LP: #1609742)
    - SAUCE: i915_bpo: Sync with v4.7

  * s390/cio: fix reset of channel measurement block (LP: #1609415)
    - s390/cio: allow to reset channel measurement block

  * in Ubuntu16.10: Hit on Call traces and system goes down when transactional
    memory tests are running in 32TB Brazos system (LP: #1606786)
    - powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
    - powerpc/tm: Fix stack pointer corruption in __tm_recheckpoint()

  * Power Menu does not display after press the Power Button (LP: #1609204)
    - intel-vbtn: new driver for Intel Virtual Button
    - [config] enable CONFIG_INTEL_VBTN=m

  * OptiPlex 7450 AIO hangs when rebooting (LP: #1608762)
    - x86/reboot: Add Dell Optiplex 7450 AIO reboot quirk

  * virtualbox+usb 3.0 breaks boot, -28 kernel works (LP: #1604058)
    - SAUCE: xhci: Fix soft lockup in xhci_pci_probe path when XHCI_STATE_HALTED

  * linux-kernel: Freeing IRQ from IRQ context (LP: #1597908)
    - block: defer timeouts to a workqueue

  * Tunnel offload indications not stripped from encapsulated packets, causing
    performance overhead (LP: #1602755)
    - tunnels: Remove encapsulation offloads on decap.

  * lm-sensors is throwing "ERROR: Can't get value of subfeature temp1_input:
    I/O error" for be2net driver (LP: #1607387)
    - be2net: perform temperature query in adapter regardless of its interface
      state

  * Dell dock MAC Address pass through doesn't work in Ubuntu (LP: #1579984)
    - r8152: Add support for setting pass through MAC address on RTL8153-AD

  * vmxnet3 LRO IPv6 performance issues (stalling TCP) (LP: #1605494)
    - Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets

  * ISST-LTE:pVM:monklp5:Ubuntu16.04.1:system crashed at
    lpfc_sli4_scmd_to_wqidx_distr (LP: #1597974)
    - SAUCE: lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from
      lpfc_send_taskmgmt()

  * Backport cxlflash shutdown patch to Xenial SRU (LP: #1605405)
    - SAUCE: cxlflash: Verify problem state area is mapped before notifying
      shutdown

  * Xenial update to v4.4.16 stable release (LP: #1607404)
    - mac80211: fix fast_tx header alignment
    - mac80211: mesh: flush mesh paths unconditionally
    - mac80211_hwsim: Add missing check for HWSIM_ATTR_SIGNAL
    - mac80211: Fix mesh estab_plinks counting in STA removal case
    - EDAC, sb_edac: Fix rank lookup on Broadwell
    - IB/cm: Fix a recently introduced locking bug
    - IB/mlx4: Properly initialize GRH TClass and FlowLabel in AHs
    - powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
    - powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
    - usb: dwc2: fix reg...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Freaky (freaky) wrote :

Hi,

having similar issues with 12.04 (with both trusty and precise kernels).

Don't see any fixes for the trusty kernel. Tried adding trusty, but launchpad gives an oops when I try to.

We have several webservers with IPv6 directly attached to internet. 2 12.04's both running with trusty kernel now and 2 16.04's.

16.04 <-> 16.04 consistently around 110MB/s (gigabit limit)
Downloading on 16.04 from the 12.04 machines is horrible. Connections die, huge drops in performance (sub 1MB/s)
12 <-> 12.04 same as above.

Please add fix to trusty kernel as well.

Kind regards,

Revision history for this message
Freaky (freaky) wrote :

Made an error in my previous comment, it's the other way around between 16.04 and 12.04.

Downloading on a 16.04 *from* a 12.04 goes fine (mostly TX for the 12.04). Downloading on 12.04 from 16.04 (or anything else) is horrible (mostly RX for 12.04).

Any incoming IPv6 on the 12.04's is rather bad and has a lot of drops.

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Trusty):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Revision history for this message
Freaky (freaky) wrote :

Installed it and speed is on par with IPv4 now.

Thanks for the quick response, much appreciated.

One question though, will the manual installation of this version get in the way of kernel updates or will it pick them up again if a newer one than 113 comes out?

Have a nice weekend!

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Newer kernels will supersede the test kernel.

Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Revision history for this message
Freaky (freaky) wrote :

Hi,

kernel works fine with the patch on IPv6.

Thanks

tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-117.164

---------------
linux (3.13.0-117.164) trusty; urgency=low

  * linux: 3.13.0-117.164 -proposed tracker (LP: #1680733)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * CVE-2017-5986
    - sctp: avoid BUG_ON on sctp_wait_for_sndbuf

  * Update ENA driver to 1.1.2 from net-next (LP: #1664312)
    - net: ena: Remove unnecessary pci_set_drvdata()
    - net: ena: Fix error return code in ena_device_init()
    - net: ena: change the return type of ena_set_push_mode() to be void.
    - net: ena: use setup_timer() and mod_timer()
    - net/ena: remove ntuple filter support from device feature list
    - net/ena: fix queues number calculation
    - net/ena: fix ethtool RSS flow configuration
    - net/ena: fix RSS default hash configuration
    - net/ena: fix NULL dereference when removing the driver after device reset
      failed
    - net/ena: refactor ena_get_stats64 to be atomic context safe
    - net/ena: fix potential access to freed memory during device reset
    - net/ena: use READ_ONCE to access completion descriptors
    - net/ena: reduce the severity of ena printouts
    - net/ena: change driver's default timeouts
    - net/ena: change condition for host attribute configuration
    - net/ena: update driver version to 1.1.2

  * [Xenial - 16.04 ]Bonding driver - stack corruption when trying to copy 20
    bytes to a sockaddr (LP: #1668042)
    - net/bonding: Enforce active-backup policy for IPoIB bonds

  * stress_smoke_test passing and exiting rc=9 (linux 4.9.0-12.13 ADT test
    failure with linux 4.9.0-12.13) (LP: #1658633)
    - ext4: lock the xattr block before checksuming it

  * vmxnet3 LRO IPv6 performance issues (stalling TCP) (LP: #1605494)
    - Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets

  * move aufs.ko from -extra to linux-image package (LP: #1673498)
    - [config] aufs.ko moved to linux-image package

  * lsattr 32bit does not work on 64bit kernel (Inappropriate ioctl error)
    (LP: #1619918)
    - btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls

 -- Kleber Sacilotto de Souza <email address hidden> Thu, 06 Apr 2017 17:52:50 +0100

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.