Xenial: data corruption when using i40e with iommu

Bug #1802421 reported by Daniel Axtens on 2018-11-09
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned

Bug Description

A user reports that using an i40e with intel_iommu=on with the Xenial GA kernel causes data corruption. Using the Xenial HWE kernel or an out-of-tree driver more recent than the version shipped with Xenial solves the issue.

[Impact]
Corrupted data is returned from the network card intermittently. This is often noticeable when using apt, as the checksums are verified. If often leads to failure of apt operations. When there are no checksums done, this could lead to silent data corruption.

[Fix]
This was fixed somewhere post-4.4. Testing identified b32bfa17246d ("i40e: Drop packet split receive routine") which is part of a broader refactor. Picking this patch alone is sufficient to fix the issue. My theory is that iommu exposes an issue in the packet split receive routine and so removing it is sufficient to prevent the problem from occurring.

[Test]
A user tested a Xenial 4.4 kernel with this patch applied and it fixed their issue - no data corruption was observed. (The test repeatedly deletes the apt cache and then does apt update.)

[Regression Potential]
It's a messy change inside i40e, so the risk is that i40e will be broken in some subtle way we haven't noticed, or have performance issues. None of these have been observed so far.

Daniel Axtens (daxtens) on 2018-11-09
description: updated
Changed in linux (Ubuntu Xenial):
status: New → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers