Missing IOTLB flush causes DMAR errors with SR-IOV
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Jay Vosburgh | ||
Trusty |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
SRU Justification:
Impact:
Using SR-IOV with Intel IOMMUs can observe DMAR errors of the
following type:
[606483.223009] DMAR:[fault reason 05] PTE Write access is not set
[606484.071974] dmar: DRHD: handling fault status reg 402
[606484.077121] dmar: DMAR:[DMA Write] Request device [d8:0a.1] fault addr 35c6e000
The DMAR error causes, at a minimum, loss of network traffic
because the request being serviced is lost. Network cards were also
observed to experience transmit timeouts after a DMAR fault.
In this case, these errors arise from a race condition in
the IOTLB management; this race is described (and fixed) in upstream
commit:
commit ea8ea460c9ace60
Author: David Woodhouse <email address hidden>
Date: Wed Mar 5 17:09:32 2014 +0000
iommu/vt-d: Clean up and fix page table clear/free behaviour
This commit first appeared in mainline 3.15. This issue
affects only the Ubuntu 3.13 kernel series.
Fix:
The race avoidance portion of the above was backported to
3.14-stable, but was never incorporated into the Ubuntu 3.13
kernel series.
commit 51d20e1096a711f
Author: David Woodhouse <email address hidden>
Date: Mon Jun 9 14:09:53 2014 +0100
iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap()
Based on commit ea8ea460c9ace60
This 3.14-stable patch was tested by the customer and observed
to resolve the issue in their environment.
Testcase:
In this case, the issue occurs on very recent Intel based
servers using two different SR-IOV network cards (i40e and bnxt) at a
customer site. The customer has tested the patch in their environment
and confirmed that it resolves the issue.
Changed in linux (Ubuntu): | |
assignee: | nobody → Jay Vosburgh (jvosburgh) |
Changed in linux (Ubuntu): | |
status: | Incomplete → In Progress |
Changed in linux (Ubuntu): | |
status: | In Progress → Confirmed |
Changed in linux (Ubuntu Trusty): | |
status: | New → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1697053
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.