Intel i40e PF reset due to incorrect MDD detection (continues...again...)

Bug #1772675 reported by Dan Streetman on 2018-05-22
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Status tracked in Cosmic
Xenial
Low
Dan Streetman
Bionic
Low
Dan Streetman
Cosmic
Low
Dan Streetman

Bug Description

[impact]

The i40e driver sometimes causes a "malicious device" event that the firmware detects, which causes the firmware to reset the nic, causing an interruption in the network connection - which can cause further problems, e.g. if the interface is in a bond; the reset will at least cause a temporary interruption in network traffic.

[fix]

The fix for this is currently unknown. As the "MDD event" is generated by the i40e firmware, and is completely undocumented, there is no way to tell what the i40e driver did to cause the MDD event.

[test case]

the bug is unfortunately very difficult to reproduce, but as shown in this (and previous) bug comments, some users of the i40e have traffic that can consistently reproduce the problem (although usually on the order of days, or longer, to reproduce). Reproducing is easily detected, as the nw traffic will be interrupted and the system logs will contain a message like:

i40e 0000:02:00.1: TX driver issue detected, PF reset issued

[regression potential]

unknown since the specific fix is unknown.

[original description]

This is a continuation from bug 1713553 and then bug 1723127; a patch was added in the first bug and then the second bug, to attempt to fix this, and it may have helped reduce the issue but appears not to have fixed it, based on more reports.

See bug 1713553 and bug 1723127 for more details.

Dan Streetman (ddstreet) wrote :

For details about i40e registers that may be able to help debug the cause of this, see bug 1723127 comment 10.

Also, a (possible) workaround to avoid this error is to disable TSO on the i40e nic.

Changed in linux (Ubuntu Xenial):
assignee: nobody → Dan Streetman (ddstreet)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Dan Streetman (ddstreet)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Dan Streetman (ddstreet)
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Dan Streetman (ddstreet) on 2018-05-25
Changed in linux (Ubuntu Xenial):
importance: Undecided → Low
Changed in linux (Ubuntu Bionic):
importance: Undecided → Low
Changed in linux (Ubuntu Cosmic):
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers