i40e PF reset due to incorrect MDD event

Bug #1772675 reported by Dan Streetman on 2018-05-22
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Low
Heitor Alves de Siqueira
Xenial
High
Heitor Alves de Siqueira
Bionic
High
Heitor Alves de Siqueira
Cosmic
Low
Heitor Alves de Siqueira

Bug Description

[Impact]
The i40e driver sometimes causes a "malicious device" event that the firmware detects, which causes the firmware to reset the NIC, causing an interruption in the network connection - which can cause further problems, e.g. if the interface is in a bond; the reset will at least cause a temporary interruption in network traffic.

[Fix]
In the case of MDD events issued for the PF, they are usually the result of a misconfigured TX descriptor and not due to "bad" actions in the VFs. We don't need to issue a reset to the whole NIC, TX hang checks should handle those if necessary.

[Test Procedure]
The bug is unfortunately difficult to reproduce, as there's no detailed documentation on how the i40e firmware detects and raises MDDs. We have seen reports of this happening in Xenial and Bionic, for workloads stressing i40e bonds in LACP mode.
Reproducing is easily detected, as the network traffic will be interrupted and the system logs will contain a message like:
i40e 0000:02:00.1: TX driver issue detected, PF reset issued

An alternative test procedure makes use of the kprobes attached to the LP bug. The test setup is as follows:
- Create 2 VFs on primary NIC
- Passthrough VF 1 to a Bionic VM
- Start iperf3 client on VM, going through i40evf interface
- Start another iperf3 client on host, going through i40e interface
Both iperf3 clients should be using an external server located on a separate host. By loading the kprobe module while iperf3 is running, we should be able to raise MDDs more consistently. MDD behaviour can change according to firmware version, so we may need to try with different sets of probes. The one with the most consistent results seems to be 'corrupt_tx_desc_addr', which corrupts the cmd_type_offset_bsz field of the last TX descriptor before the NIC is notified of new data.

[Regression Potential]
Since we're removing resets for the NIC, regressions could show up as issues in connectivity after the MDD events are raised. If the firmware expects the whole NIC to reset, we could see TX/RX hangs and general unresponsiveness in networking. The potential for this should however be fairly low, as this patch has been present since kernel 5.2 and hasn't seen any fixes or regressions upstream. Basic smoke tests also showed that the driver continues working as expected, and that necessary PF resets will be issued by the netdev watchdog in case of any hung queues.

==
[original description]

This is a continuation from bug 1713553 and then bug 1723127; a patch was added in the first bug and then the second bug, to attempt to fix this, and it may have helped reduce the issue but appears not to have fixed it, based on more reports.

See bug 1713553 and bug 1723127 for more details.

Dan Streetman (ddstreet) wrote :

For details about i40e registers that may be able to help debug the cause of this, see bug 1723127 comment 10.

Also, a (possible) workaround to avoid this error is to disable TSO on the i40e nic.

Changed in linux (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Dan Streetman (ddstreet)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Dan Streetman (ddstreet)
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Dan Streetman (ddstreet) on 2018-05-25
Changed in linux (Ubuntu Xenial):
importance: Undecided → Low
Changed in linux (Ubuntu Bionic):
importance: Undecided → Low
Changed in linux (Ubuntu Cosmic):
importance: Undecided → Low
Dan Streetman (ddstreet) wrote :

as I can't reproduce this, and I have heard no more reports of it, i'm marking this as incomplete. If anyone does actually still see this problem with the latest (x/b/c) kernel, please add a comment to this bug.

Changed in linux (Ubuntu Xenial):
assignee: Dan Streetman (ddstreet) → nobody
Changed in linux (Ubuntu Bionic):
assignee: Dan Streetman (ddstreet) → nobody
Changed in linux (Ubuntu Cosmic):
assignee: Dan Streetman (ddstreet) → nobody
Changed in linux (Ubuntu Xenial):
status: In Progress → Incomplete
Changed in linux (Ubuntu Bionic):
status: In Progress → Incomplete
Changed in linux (Ubuntu Cosmic):
status: In Progress → Incomplete
Terry Hardie (terryh-orcas) wrote :

We are getting this error on all of our new systems (Dell 14G C6420) running Xenial. I've tried 4.4.0-139-generic. I'm now trying 4.13.0-45-generic and see if it still shows up there.

Dan Streetman (ddstreet) wrote :

@terryh-orcas,

if you are able to reproduce the problem relatively quickly and easily, then I suggest testing different kernel versions, up to the latest upstream, to see if and where it may be fixed with a newer i40e kernel driver. You can get upstream kernel debs here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D

If you can narrow down the kernel to a specific short range (i.e. kernel X definitely fails, kernel Y never fails), I can review the upstream i40e driver for specific changes to backport.

If you can't reproduce it easily/quickly, there is another method of debug involving undocumented i40e register modification. See bug 1723127 comment 10 for details. If you try that method, you should attempt it with the latest kernel you can reproduce the problem with. As I don't have the chipset specifications, if you do reproduce it this way and can isolate the problem to a specific register/bit, I'll have to take that info back to Intel to ask them for clarification. Also note that there are 2 registers that you have to test each bit individually for, so this method can take a very long time if it takes you a long time to reproduce the problem.

Unfortunately, as has been mentioned in this and past bugs, the MDD event is generated by the i40e firmware and there is no documented way to tell what the i40e kernel driver did that the firmware didn't like (assuming it was something the driver did, and not external or firmware issues). Intel does update their upstream i40e driver with fixes for MDD firmware/driver bugs regularly, so this will likely only be fixed by a patch coming from Intel upstream, that we need to backport to our older stable Ubuntu kernel(s).

Sorry I can't help more.

Oladimeji Fayomi (fayomidimeji) wrote :

Hi,
We have disabled TSO and GSO and we are still experiencing the interface resets. This usually happens under high load.

Kernel version: 4.4.0-135-generic

driver: i40e
version: 2.4.10
firmware-version: 6.01 0x80003493 0.0.0
bus-info: 0000:05:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Dan Streetman (ddstreet) wrote :

@fayomidimeji, my comment 4 applies to you as well.

Additionally, you both might want to verify you are actually seeing this problem, and not something else.

Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Bionic) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Bionic):
status: Incomplete → Expired
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Cosmic) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Cosmic):
status: Incomplete → Expired
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Xenial) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Xenial):
status: Incomplete → Expired
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Changed in linux (Ubuntu):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in linux (Ubuntu):
status: Expired → Fix Released
Changed in linux (Ubuntu Xenial):
status: Expired → In Progress
Changed in linux (Ubuntu Bionic):
status: Expired → In Progress
Changed in linux (Ubuntu Cosmic):
status: Expired → In Progress
tags: added: sts
Changed in linux (Ubuntu Cosmic):
status: In Progress → Won't Fix

The patch [0] is present in Ubuntu releases starting with Focal, so it should be fixed for that and later releases (including current bionic-hwe). I'll look into backporting and testing this for Xenial and Bionic GA.

$ git log --oneline -1 a1df906c5be7
  a1df906c5be7 i40e: change behavior on PF in response to MDD event
$ git describe --contains a1df906c5be7
  v5.2-rc1~133^2~57^2~9

[0] https://git.kernel.org/linus/a1df906c5be7

I've pushed some test kernels for X/B to a public PPA assigned to this bug [0], hopefully it'll make validating the upstream patch a bit easier.

[0] https://launchpad.net/~halves/+archive/ubuntu/lp1772675

Changed in linux (Ubuntu Xenial):
importance: Low → High
Changed in linux (Ubuntu Bionic):
importance: Low → High
description: updated
description: updated
summary: Intel i40e PF reset due to incorrect MDD detection
- (continues...again...)
summary: - Intel i40e PF reset due to incorrect MDD detection
+ i40e PF reset due to incorrect MDD event
Download full text (7.9 KiB)

Given that we're could be changing reset behavior that might be expected from the firmware, I wrote a quick set of kprobes to force the firmware to raise MDD events and test out the patched kernel from the PPA.

I tried to force faulty TX descriptors according to "Table 7-138. Tx Descriptor Validity Checks" in the XL710 Datasheet, under section "7.6.2.2.1 Interrupt on Misbehavior of VM (Malicious Driver Detection)". This document is publicly available at Intel's Technical Library site for this NIC.

The test setup is as follows:
- Create 2 VFs on primary NIC
- Passthrough VF 1 to a Bionic VM
- Start iperf3 client on VM, going through i40evf interface
- Start another iperf3 client on host, going through i40e interface

The iperf3 servers in my testing were running on a separate host, so I only had clients using the i40e NIC. This was primarily to verify what the networking and connectivity impact would be if we ran into any MDDs.

After both iperf3 clients were running, I loaded the kprobe modules according to a specific TX check to validate. Raising MDDs on the VF turned out to be pretty trivial, and most of the i40e probes also work on i40evf. MDDs on the PF were a bit more tricky to get, but I had good results with corrupting the final TX descriptor's cmd_type_offset_bsz field. As this happens right before the driver notifies the NIC about the new data, it should force the firmware to raise the MDD event, as opposed to us "manually" triggering it from the driver. This has the benefit of keeping things consistent from the firmware's point of view, as in the end it is the one responsible for detecting and notifying the kernel about those events.

The primary point with this test was to verify whether we could leave the NIC in an inconsistent state, by avoiding or delaying the PF reset. The results were promising, and should hopefully give some more data on the value of the upstream patch.

When raising MDDs on the VF, the firmware correctly slaps the appropriate queues and schedules any resets as required. This is the same behavior as before. With the test kernel however, we don't issue any resets to the PF, so the iperf3 tests continue running uninterrupted as desired.

When raising MDDs on the PF, we don't issue any resets anymore and depending on what probe was used, connectivity will stop momentarily. The netdev watchdog kicks in shortly afterwards, and issues a PF reset as appropriate, and network connectivity resumes. This confirms that even with the upstream patch any hung queues that don't reset immediately will recover afterwards, as the queue watchdogs will take care of those. This is consistent with the upstream behavior, and the kernel logs look similar as to the one below:

[ 573.279608] NETDEV WATCHDOG: ens1f1 (i40e): transmit queue 1 timed out
[ 573.279652] WARNING: CPU: 14 PID: 0 at /build/linux-lqvoqZ/linux-4.15.0/net/sched/sch_generic.c:323 dev_watchdog+0x221/0x230
[ 573.279659] Modules linked in: vhost_net vhost tap vfio_pci vfio_virqfd vfio_iommu_type1 vfio i40evf xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt...

Read more...

I'm attaching kprobes for the Xenial kernels. These are based on the 4.4.0-203/204 versions.
I followed the same test setup for Bionic, as described in the previous comment, and also had similar results. The netdev watchdog seems to take good care of any hung queues, so in the end PF resets will be issued regardless, if necessary.

description: updated
description: updated
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
tags: added: verification-needed-xenial

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Verified for Xenial following the same test procedure from comment #13, running on the current -proposed kernel:
ubuntu@snorlax:~/kprobe$ uname -rv
4.4.0-207-generic #239-Ubuntu SMP Thu Mar 25 02:59:26 UTC 2021

The offsets don't seem to have changed on any of the probes, so I've used the same set that's already uploaded to the bug. The netdev watchdog kicks in if queues hang, and general test results look good.

tags: added: verification-done-xenial
removed: verification-needed-xenial

Verified for Bionic following the same test procedure from comment #13, running on the current -proposed kernel:
ubuntu@snorlax:~$ uname -rv
4.15.0-141-generic #145-Ubuntu SMP Wed Mar 24 18:08:07 UTC 2021

I had to adjust the tx_desc_addr probes, subtracting 0x4 from both offsets and changing the relevant registers (r15 for i40e, r11 for i40evf). Other probes didn't need any changes.
The test results were similar as on Xenial, netdev watchdog works as expected and no major issues were encountered with a smoke test.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (11.4 KiB)

This bug was fixed in the package linux - 4.15.0-141.145

---------------
linux (4.15.0-141.145) bionic; urgency=medium

  * bionic/linux: 4.15.0-141.145 -proposed tracker (LP: #1919536)

  * binary assembly failures with CONFIG_MODVERSIONS present (LP: #1919315)
    - [Packaging] quiet (nomially) benign errors in BUILD script

  * selftests: bpf verifier fails after sanitize_ptr_alu fixes (LP: #1920995)
    - bpf: Simplify alu_limit masking for pointer arithmetic
    - bpf: Add sanity check for upper ptr_limit
    - bpf, selftests: Fix up some test_verifier cases for unprivileged

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * CVE-2018-13095
    - xfs: More robust inode extent count validation

  * i40e PF reset due to incorrect MDD event (LP: #1772675)
    - i40e: change behavior on PF in response to MDD event

  * Bionic update: upstream stable patchset 2021-03-09 (LP: #1918330)
    - ACPI: sysfs: Prefer "compatible" modalias
    - ARM: dts: imx6qdl-gw52xx: fix duplicate regulator naming
    - wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
    - net: usb: qmi_wwan: added support for Thales Cinterion PLSx3 modem family
    - drivers: soc: atmel: Avoid calling at91_soc_init on non AT91 SoCs
    - drivers: soc: atmel: add null entry at the end of at91_soc_allowed_list[]
    - KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in
      intel_arch_events[]
    - KVM: x86: get smi pending status correctly
    - xen: Fix XenStore initialisation for XS_LOCAL
    - leds: trigger: fix potential deadlock with libata
    - mt7601u: fix kernel crash unplugging the device
    - mt7601u: fix rx buffer refcounting
    - xen-blkfront: allow discard-* nodes to be optional
    - ARM: imx: build suspend-imx6.S with arm instruction set
    - netfilter: nft_dynset: add timeout extension to template
    - xfrm: Fix oops in xfrm_replay_advance_bmp
    - RDMA/cxgb4: Fix the reported max_recv_sge value
    - iwlwifi: pcie: use jiffies for memory read spin time limit
    - iwlwifi: pcie: reschedule in long-running memory reads
    - mac80211: pause TX while changing interface type
    - can: dev: prevent potential information leak in can_fill_info()
    - x86/entry/64/compat: Preserve r8-r11 in int $0x80
    - x86/entry/64/compat: Fix "x86/entry/64/compat: Preserve r8-r11 in int $0x80"
    - iommu/vt-d: Gracefully handle DMAR units with no supported address widths
    - iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not built
    - NFC: fix resource leak when target index is invalid
    - NFC: fix possible resource leak
    - team: protect features update by RCU to avoid deadlock
    - tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN
    - kernel: kexec: remove the lock operation of system_transition_mutex
    - PM: hibernate: flush swap writer after marking
    - pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
    - net/mlx5: Fix memory leak on flow table creation error flow
    - rxrpc: Fix memory leak in rxrpc_lookup_local
    - net: dsa: bcm_sf2: put device node before return
    - ibmvnic: Ensure that CRQ entry read are correctly ordered
    - ACPI: thermal: Do...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (20.3 KiB)

This bug was fixed in the package linux - 4.4.0-208.240

---------------
linux (4.4.0-208.240) xenial; urgency=medium

  * xenial/linux: 4.4.0-208.240 -proposed tracker (LP: #1922069)

  * linux ADT test failure with linux/4.4.0-207.239 -
    ubuntu_qrt_kernel_security.test-kernel-security.py (LP: #1922200) //
    CVE-2018-5953 // CVE-2018-5995 // CVE-2018-7754
    - SAUCE: Revert "printk: hash addresses printed with %p"

  * lxd 2.0.11-0ubuntu1~16.04.4 ADT test failure with linux 4.4.0-207.239
    (LP: #1921969)
    - SAUCE: Fix fuse regression in 4.4.0-207.239

linux (4.4.0-207.239) xenial; urgency=medium

  * xenial/linux: 4.4.0-207.239 -proposed tracker (LP: #1919558)

  * Xenial update: v4.4.262 upstream stable release (LP: #1920221)
    - uapi: nfnetlink_cthelper.h: fix userspace compilation error
    - ath9k: fix transmitting to stations in dynamic SMPS mode
    - net: Fix gro aggregation for udp encaps with zero csum
    - can: skb: can_skb_set_owner(): fix ref counting if socket was closed before
      setting skb ownership
    - can: flexcan: assert FRZ bit in flexcan_chip_freeze()
    - can: flexcan: enable RX FIFO after FRZ/HALT valid
    - netfilter: x_tables: gpf inside xt_find_revision()
    - cifs: return proper error code in statfs(2)
    - floppy: fix lock_fdc() signal handling
    - Revert "mm, slub: consider rest of partial list if acquire_slab() fails"
    - futex: Change locking rules
    - futex: Cure exit race
    - futex: fix dead code in attach_to_pi_owner()
    - net/mlx4_en: update moderation when config reset
    - net: lapbether: Remove netif_start_queue / netif_stop_queue
    - net: davicom: Fix regulator not turned off on failed probe
    - net: davicom: Fix regulator not turned off on driver removal
    - media: usbtv: Fix deadlock on suspend
    - mmc: mxs-mmc: Fix a resource leak in an error handling path in
      'mxs_mmc_probe()'
    - mmc: mediatek: fix race condition between msdc_request_timeout and irq
    - powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
    - PCI: xgene-msi: Fix race in installing chained irq handler
    - s390/smp: __smp_rescan_cpus() - move cpumask away from stack
    - scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
    - ALSA: hda/hdmi: Cancel pending works before suspend
    - ALSA: hda: Avoid spurious unsol event handling during S3/S4
    - ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar
    - s390/dasd: fix hanging DASD driver unbind
    - mmc: core: Fix partition switch time for eMMC
    - scripts/recordmcount.{c,pl}: support -ffunction-sections .text.* section
      names
    - Goodix Fingerprint device is not a modem
    - usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio
      slot
    - usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
    - xhci: Improve detection of device initiated wake signal.
    - USB: serial: io_edgeport: fix memory leak in edge_startup
    - USB: serial: ch341: add new Product ID
    - USB: serial: cp210x: add ID for Acuity Brands nLight Air Adapter
    - USB: serial: cp210x: add some more GE USB IDs
    - usbip: fix stub_dev to check for stream ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers