Missing IOTLB flush causes DMAR errors with SR-IOV

Bug #1697053 reported by Jay Vosburgh
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Jay Vosburgh
Trusty
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:

Impact:

        Using SR-IOV with Intel IOMMUs can observe DMAR errors of the
following type:

[606483.223009] DMAR:[fault reason 05] PTE Write access is not set
[606484.071974] dmar: DRHD: handling fault status reg 402
[606484.077121] dmar: DMAR:[DMA Write] Request device [d8:0a.1] fault addr 35c6e000

        The DMAR error causes, at a minimum, loss of network traffic
because the request being serviced is lost. Network cards were also
observed to experience transmit timeouts after a DMAR fault.

        In this case, these errors arise from a race condition in
the IOTLB management; this race is described (and fixed) in upstream
commit:

commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d
Author: David Woodhouse <email address hidden>
Date: Wed Mar 5 17:09:32 2014 +0000

    iommu/vt-d: Clean up and fix page table clear/free behaviour

        This commit first appeared in mainline 3.15. This issue
affects only the Ubuntu 3.13 kernel series.

Fix:

        The race avoidance portion of the above was backported to
3.14-stable, but was never incorporated into the Ubuntu 3.13
kernel series.

commit 51d20e1096a711f8cfa9d98a3ac2dd2c7c0fc20c
Author: David Woodhouse <email address hidden>
Date: Mon Jun 9 14:09:53 2014 +0100

    iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap()

    Based on commit ea8ea460c9ace60bbb5ac6e5521d637d5c15293d upstream

        This 3.14-stable patch was tested by the customer and observed
to resolve the issue in their environment.

Testcase:

        In this case, the issue occurs on very recent Intel based
servers using two different SR-IOV network cards (i40e and bnxt) at a
customer site. The customer has tested the patch in their environment
and confirmed that it resolves the issue.

Jay Vosburgh (jvosburgh)
Changed in linux (Ubuntu):
assignee: nobody → Jay Vosburgh (jvosburgh)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1697053

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Jay Vosburgh (jvosburgh)
Changed in linux (Ubuntu):
status: In Progress → Confirmed
Stefan Bader (smb)
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Revision history for this message
Leonardo Borda (lborda) wrote :

that indeed solved the problem.

Revision history for this message
Leonardo Borda (lborda) wrote :

Sorry let me correct. I still need to validate with the -proposed kernel. The applied commit fix the issue though.

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

proposed kernel tested by customer

tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-125.174

---------------
linux (3.13.0-125.174) trusty; urgency=low

  * linux: 3.13.0-125.174 -proposed tracker (LP: #1703396)

  * NULL pointer dereference triggered by openvswitch autopkg testcase
    (LP: #1703401)
    - Revert "rtnl/do_setlink(): notify when a netdev is modified"
    - Revert "rtnl/do_setlink(): last arg is now a set of flags"
    - Revert "rtnl/do_setlink(): set modified when IFLA_LINKMODE is updated"
    - Revert "rtnl/do_setlink(): set modified when IFLA_TXQLEN is updated"
    - Revert "rtnetlink: provide api for getting and setting slave info"

linux (3.13.0-124.173) trusty; urgency=low

  * linux: 3.13.0-124.173 -proposed tracker (LP: #1701042)

  * CVE-2017-7895
    - nfsd: Remove assignments inside conditions
    - svcrdma: Do not add XDR padding to xdr_buf page vector
    - nfsd4: minor NFSv2/v3 write decoding cleanup
    - nfsd: stricter decoding of write-like NFSv2/v3 ops

  * CVE-2017-9605
    - drm/vmwgfx: Make sure backup_handle is always valid

  * CVE-2017-1000380
    - ALSA: timer: Fix race between read and ioctl
    - ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT

  * linux <3.18: netlink notification is missing when an interface is modified
    (LP: #1690094)
    - rtnetlink: provide api for getting and setting slave info
    - rtnl/do_setlink(): set modified when IFLA_TXQLEN is updated
    - rtnl/do_setlink(): set modified when IFLA_LINKMODE is updated
    - rtnl/do_setlink(): last arg is now a set of flags
    - rtnl/do_setlink(): notify when a netdev is modified

  * CVE-2015-8944
    - Make file credentials available to the seqfile interfaces
    - /proc/iomem: only expose physical resource addresses to privileged users

  * CVE-2016-10088
    - sg_write()/bsg_write() is not fit to be called under KERNEL_DS

  * CVE-2017-7346
    - drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()

  * CVE-2015-8966
    - arm: fix handling of F_OFD_... in oabi_fcntl64()

  * Missing IOTLB flush causes DMAR errors with SR-IOV (LP: #1697053)
    - iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap()

  * CVE-2017-8924
    - USB: serial: io_ti: fix information leak in completion handler

  * CVE-2017-8925
    - USB: serial: omninet: fix reference leaks at open

  * CVE-2015-8967
    - arm64: make sys_call_table const

  * CVE-2015-8964
    - tty: Prevent ldisc drivers from re-using stale tty fields

  * CVE-2015-8955
    - arm64: perf: reject groups spanning multiple HW PMUs

  * CVE-2015-8962
    - sg: Fix double-free when drives detach during SG_IO

  * CVE-2015-8963
    - perf: Fix race in swevent hash

  * CVE-2017-9074
    - ipv6: Check ip6_find_1stfragopt() return value properly.

  * CVE-2014-9900
    - net: Zeroing the structure ethtool_wolinfo in ethtool_get_wol()

 -- Thadeu Lima de Souza Cascardo <email address hidden> Mon, 10 Jul 2017 13:02:31 -0300

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.