Intel i40e PF reset due to incorrect MDD detection

Bug #1713553 reported by Dan Streetman on 2017-08-28
44
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Dan Streetman
Xenial
Undecided
Unassigned

Bug Description

[Impact]

Using an Intel i40e network device, under heavy traffic load with
TSO enabled, the device will spontaneously reset itself and issue errors
similar to the following:

Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e 0000:05:00.1: TX driver issue detected, PF reset issued
Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e 0000:05:00.1: TX driver issue detected, PF reset issued
Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e 0000:05:00.1: TX driver issue detected, PF reset issued

 This causes a full reset of the PF, which causes an interruption
in traffic flow.

This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue:

commit 841493a3f64395b60554afbcaa17f4350f90e764
Author: Alexander Duyck <email address hidden>
Date: Tue Sep 6 18:05:04 2016 -0700

    i40e: Limit TX descriptor count in cases where frag size is greater than 16K

 This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel.

[Testcase]

 In this case, the issue occurs at a customer site using i40e based
Intel network cards with SR-IOV enabled. Under heavy load, the card will
reset itself as described.

[Regression Potential]

As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset.

[Other Info]

The previous bug for this issue is bug 1700834.

CVE References

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1713553

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Dan Streetman (ddstreet) on 2017-08-28
Changed in linux (Ubuntu):
status: Incomplete → In Progress
importance: Undecided → Medium
assignee: nobody → Dan Streetman (ddstreet)
Dan Streetman (ddstreet) wrote :

Note there is one additional upstream commit that improves performance by allowing up to 12k per tx descriptor, instead of 8k per descriptor (the current code in Xenial 4.4 kernel), and its changes are related to the fixes for this issue. However, from my reading of the code, I don't think that commit is actually required to fix this problem, so I am not including it in this bug (yet).

commit 5c4654daf2e2f25dfbd7fa572c59937ea6d4198b
Author: Alexander Duyck <email address hidden>
Date: Fri Feb 19 12:17:08 2016 -0800

    i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K

Dan Streetman (ddstreet) wrote :

Re: my last comment, testing confirmed that commit 5c4654 is *not* needed to fix this bug, so I am not including it. Only commit 841493a3 as listed in the bug description is required to fix this.

Stefan Bader (smb) on 2017-09-15
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Dan Streetman (ddstreet) wrote :

The original reporter to me verified that with the patch the problem does not reoccur for several days, when previously they could reproduce it within a day; unfortunately as this problem is hard to reproduce that is the best verification possible from me currently.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (7.8 KiB)

This bug was fixed in the package linux - 4.4.0-97.120

---------------
linux (4.4.0-97.120) xenial; urgency=low

  * linux: 4.4.0-97.120 -proposed tracker (LP: #1718149)

  * blk-mq: possible deadlock on CPU hot(un)plug (LP: #1670634)
    - [Config] s390x -- disable CONFIG_{DM, SCSI}_MQ_DEFAULT

  * Xenial update to 4.4.87 stable release (LP: #1715678)
    - irqchip: mips-gic: SYNC after enabling GIC region
    - i2c: ismt: Don't duplicate the receive length for block reads
    - i2c: ismt: Return EMSGSIZE for block reads with bogus length
    - ceph: fix readpage from fscache
    - cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs
    - cpuset: Fix incorrect memory_pressure control file mapping
    - alpha: uapi: Add support for __SANE_USERSPACE_TYPES__
    - CIFS: remove endian related sparse warning
    - wl1251: add a missing spin_lock_init()
    - xfrm: policy: check policy direction value
    - drm/ttm: Fix accounting error when fail to get pages for pool
    - kvm: arm/arm64: Fix race in resetting stage2 PGD
    - kvm: arm/arm64: Force reading uncached stage2 PGD
    - epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()
    - crypto: algif_skcipher - only call put_page on referenced and used pages
    - Linux 4.4.87

  * Xenial update to 4.4.86 stable release (LP: #1715430)
    - scsi: isci: avoid array subscript warning
    - ALSA: au88x0: Fix zero clear of stream->resources
    - btrfs: remove duplicate const specifier
    - i2c: jz4780: drop superfluous init
    - gcov: add support for gcc version >= 6
    - gcov: support GCC 7.1
    - lightnvm: initialize ppa_addr in dev_to_generic_addr()
    - p54: memset(0) whole array
    - lpfc: Fix Device discovery failures during switch reboot test.
    - arm64: mm: abort uaccess retries upon fatal signal
    - x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
    - arm64: fpsimd: Prevent registers leaking across exec
    - scsi: sg: protect accesses to 'reserved' page array
    - scsi: sg: reset 'res_in_use' after unlinking reserved array
    - drm/i915: fix compiler warning in drivers/gpu/drm/i915/intel_uncore.c
    - Linux 4.4.86

  * Xenial update to 4.4.85 stable release (LP: #1714298)
    - af_key: do not use GFP_KERNEL in atomic contexts
    - dccp: purge write queue in dccp_destroy_sock()
    - dccp: defer ccid_hc_tx_delete() at dismantle time
    - ipv4: fix NULL dereference in free_fib_info_rcu()
    - net_sched/sfq: update hierarchical backlog when drop packet
    - ipv4: better IP_MAX_MTU enforcement
    - sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
    - tipc: fix use-after-free
    - ipv6: reset fn->rr_ptr when replacing route
    - ipv6: repair fib6 tree in failure case
    - tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
    - irda: do not leak initialized list.dev to userspace
    - net: sched: fix NULL pointer dereference when action calls some targets
    - net_sched: fix order of queue length updates in qdisc_replace()
    - mei: me: add broxton pci device ids
    - mei: me: add lewisburg device ids
    - Input: trackpoint - add new trackpoint firmware ID
    - Input: elan_i2c...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Björn Zettergren (bjozet) wrote :
Download full text (5.1 KiB)

Hi,

Thanks for your efforts with this issue, however we're still experiencing problems with the newest kernel. Sorry about missing the patch-testing-window, we should have been there for you :)

After only 20 minutes of runtime with the new kernel, we saw the following, and networking is basically useless:

[ 2.410644] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 1.4.25-k
[ 2.419791] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
[ 2.483362] i40e 0000:02:00.0: fw 5.40.47690 api 1.5 nvm 5.40 0x80002d35 18.0.16
[ 2.896678] i40e 0000:02:00.0: MAC address: 3c:fd:fe:1a:b5:e0
[ 2.903768] i40e 0000:02:00.0: SAN MAC: 3c:fd:fe:1a:b5:e1
[ 3.189818] i40e 0000:02:00.0: PCI-Express: Speed 8.0GT/s Width x4
[ 3.193934] i40e 0000:02:00.0: PCI-Express bandwidth available for this device may be insufficient for optimal performance.
[ 3.202198] i40e 0000:02:00.0: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate.
[ 3.241095] i40e 0000:02:00.0: Features: PF-id[0] VFs: 64 VSIs: 2 QP: 4 RX: 1BUF RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[ 3.279202] i40e 0000:02:00.1: fw 5.40.47690 api 1.5 nvm 5.40 0x80002d35 18.0.16
[ 3.531346] i40e 0000:02:00.1: MAC address: 3c:fd:fe:1a:b5:e2
[ 3.539557] i40e 0000:02:00.1: SAN MAC: 3c:fd:fe:1a:b5:e3
[ 3.761719] i40e 0000:02:00.1: PCI-Express: Speed 8.0GT/s Width x4
[ 3.765721] i40e 0000:02:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance.
[ 3.773539] i40e 0000:02:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate.
[ 3.812022] i40e 0000:02:00.1: Features: PF-id[1] VFs: 64 VSIs: 2 QP: 4 RX: 1BUF RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[ 3.855168] i40e 0000:02:00.0 p1p1: renamed from eth2
[ 3.895278] i40e 0000:02:00.1 p1p2: renamed from eth0
[ 7.205832] i40e 0000:02:00.1 p1p2: already using mac address 3c:fd:fe:1a:b5:e2
[ 7.208378] i40e 0000:02:00.1 p1p2: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None
[ 7.208401] i40e 0000:02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e2 vid=0
[ 7.208453] i40e 0000:02:00.0 p1p1: set new mac address 3c:fd:fe:1a:b5:e2
[ 7.217191] i40e 0000:02:00.0 p1p1: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None
[ 7.217215] i40e 0000:02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e2 vid=0
[ 7.240919] i40e 0000:02:00.1 p1p2: set new mac address 3c:fd:fe:1a:b5:e0
[ 7.252720] i40e 0000:02:00.0 p1p1: returning to hw mac address 3c:fd:fe:1a:b5:e0
[ 7.324791] i40e 0000:02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5
[ 7.324798] i40e 0000:02:00.0 p1p1: adding 3c:fd:fe:1a:b5:e0 vid=5
[ 1109.574733] i40e 0000:02:00.1: TX driver issue detected, PF reset issued
[ 1110.011152] i40e 0000:02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=0
[ 1110.011155] i40e 0000:02:00.1 p1p2: adding 3c:fd:fe:1a:b5:e0 vid=5
[ 1110.013749] i40e 0000:02:00.1: TX driver issue detected, PF reset issued
[ 1110.013773] i40e 0000:02:00.1 p1p2: speed changed to 0 for port p1p2
[ 1110.013954] bond0: link status up again after 0 ms for interface p1p2
[ 1110.983823] i40e 0000:02:00.1 p1p2: adding 3c:fd:fe:...

Read more...

Dan Streetman (ddstreet) wrote :

> however we're still experiencing problems with the newest kernel

well, I was afraid of that. As this problem is the NIC firmware complaining but not actually telling us what it's unhappy with, there's a bit of trial-and-error here figuring out what exactly it's complaining about.

Since this bug is already 'fix released', I opened a new bug 1723127 to track continuing work on this, let's move the discussion over there.

Stefan Kooman (stefan-n1) wrote :

H there. I can confirm this problem still exists in newest kernels and with the latest intel drivers as of today:

Jan 19 16:05:19 osd9 kernel: [511271.581413] i40e 0000:02:00.1: TX driver issue detected, PF reset issued
Jan 19 16:09:08 osd9 kernel: [511500.919380] i40e 0000:02:00.0: TX driver issue detected, PF reset issued

driver: i40e-2.4.3 (and xenial / 4.13 shipped driver: 2.1.14-k)
kernel: 4.13.0-25-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux. Kernel loaded with nopti noibrs noibpb (Meltdown / Spetre mitigation disabled).

We can trigger the issue with high load (benchmarking Ceph cluster with fio: 4 clients, 8 threads, iodepth 256, 100% random write, 64K block size).

Only when we use relatively large block size (64K) do we hit this problem. With 4K blocks we do not hit this issue. We haven't tested large random reads (that test is still to be done).

When using openvswitch port-channel (as we do) with jumbo frames ... this port-channel will not come back online after the reset. rmmod i40e / modprobe i40e does the trick though.

Dan Streetman (ddstreet) wrote :

@stefan-n1, please move discussion over to bug 1723127, no more comments should be added to this bug.

Dan Streetman (ddstreet) on 2018-05-25
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers