ISST-LTE:pVM nvme 0000:a0:00.0: iommu_alloc failed on NVMe card

Bug #1633128 reported by bugproxy on 2016-10-13
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Tim Gardner
Xenial
Undecided
Tim Gardner
Yakkety
Undecided
Tim Gardner

Bug Description

== Comment: #0 - Chanh H. Nguyen <email address hidden> - 2016-06-02 14:22:00 ==
There are a huge iommu_alloc failure on the dmesg log during our ST run on the nvme card.

root@br502lp2:~# uname -r
4.4.0-22-generic
root@br502lp2:~# free -g
              total used free shared buff/cache available
Mem: 376 1 101 0 273 373
Swap: 10 0 10

root@br502lp2:~# dmesg |grep "iommu_alloc"
[ 844.572656] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000001c1a70000 npages 16
[ 844.572658] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000000c3510000 npages 16
[ 844.572664] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000023c0260000 npages 16
[ 844.572694] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000000c35b0000 npages 16
[ 844.572700] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000001c1a70000 npages 16
[ 844.572703] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000023c7a90000 npages 16
[ 844.572718] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000000c35b0000 npages 16
[ 844.572727] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000001c1a70000 npages 16
[ 844.572730] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000023c7a90000 npages 16
[ 844.572746] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c0000023c7a90000 npages 16
[ 1033.500635] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c000002a64500000 npages 16
[ 1033.500894] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c000002a64500000 npages 16
[ 1033.501290] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c000003b80d10000 npages 16
[ 1033.501337] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00 vaddr c000002903910000 npages 16
[ 1033.501590] nvme 0001:01:00.0: iommu_alloc failed, tbl c00000466a2bfc00

root@br502lp2:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinux-4.4.0-22-generic root=UUID=2852021a-6954-42a9-8c69-7a58ccfbb65d ro xmon=on disable_ddw splash quiet crashkernel=384M-:128M

root@br502lp2:~# lsmcode
Version of System Firmware is FW840.20 (SC840_100) (t) FW840.10 (SC840_079) (p) FW840.20 (SC840_100) (b)
Version of PFW is 14492016042881CF0681

== Comment: #4 - Mauricio Faria De Oliveira <email address hidden> - 2016-10-13 10:16:40 ==
Hi Canonical,

This patch series that resolves this problem made linux mainline in the 4.9 merge window:
("dma-mapping, powerpc, nvme: introduce the DMA_ATTR_NO_WARN attribute")

Can you please apply it on 16.04.x?

Thanks!

Links/commits
---

[1] "dma-mapping: introduce the DMA_ATTR_NO_WARN attribute"
    https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a9a62c9384417545620aee1b5ad1d9357350c17a
[2] "powerpc: implement the DMA_ATTR_NO_WARN attribute"
    https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=af8a24988e48f9ed20acf4d5230ac216d5baf723
[3] "nvme: use the DMA_ATTR_NO_WARN attribute"
    https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2b6b535d9158b822a45080b3d6d5b2993fd49e5a

CVE References

bugproxy (bugproxy) on 2016-10-13
tags: added: architecture-ppc64le bugnameltc-142128 severity-high targetmilestone-inin16041
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Tim Gardner (timg-tpi) on 2016-10-13
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
assignee: Taco Screen team (taco-screen-team) → Tim Gardner (timg-tpi)
status: New → In Progress
Tim Gardner (timg-tpi) on 2016-10-26
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Tim Gardner (timg-tpi) on 2016-10-26
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety

------- Comment From <email address hidden> 2016-11-21 10:14 EDT-------
Verification OK on Xenial and Yakkety.

tags: added: verification-done-xenial verification-done-yakkety
removed: verification-needed-xenial verification-needed-yakkety
Luis Henriques (henrix) wrote :

The backport of the commits that fix this bug introduce a regression in xenial and these patches are going to be reverted for the current SRU cycle (in xenial only). For completion, the regression is bug #1644596.

Changed in linux (Ubuntu Xenial):
status: Fix Committed → In Progress
tags: removed: verification-done-xenial
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-28 15:44 EDT-------
Apologies.

I verified the code was present, but we failed to realize the DMA attributes structure in Xenial is different.
When it was applied as is, the problem was introduced.

The commit "dma-mapping: introduce the DMA_ATTR_NO_WARN attribute" should be modified for the older structure. That is what I submitted in v3 [1,2,3].

I'll provide an actual backport this time.
Sorry for the inconvenience.

[1] https://patchwork.kernel.org/patch/9220769/
[2] https://patchwork.kernel.org/patch/9220777/
[3] https://patchwork.kernel.org/patch/9220783/

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-29 17:04 EDT-------
Hi Canonical
(@hendrix @timg-tpi)

Attaching proper backports of the 3 patches [1] for 16.04.x.

Built and verified on top of ubuntu-xenial.git master-next -- currently, 52.73 at 44b1707bf297 ("kvm/irqchip: kvm_arch_irq_routing_update renaming split").

This is the test-case.
A bunch of 64k writes to partitions on NVMe drive, simultaneously.
No messages are printed to dmesg.

Linux ubuntu 4.4.0-52-generic #73+bz142128v1 SMP Tue Nov 29 13:06:49 CST 2016 ppc64le ppc64le ppc64le GNU/Linux

root@ubuntu:~# dmesg -c >/dev/null; for p in 2 {5..11}; do echo $p; dd if=/dev/zero of=/dev/nvme0n1p$p bs=64k count=128k & done; wait; echo dmesg begin; dmesg; echo dmesg end
<...>
[1] Done dd if=/dev/zero of=/dev/nvme0n1p$p bs=64k count=128k
<...>
[8]+ Done dd if=/dev/zero of=/dev/nvme0n1p$p bs=64k count=128k
dmesg begin
dmesg end

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=DMA_ATTR_NO_WARN

------- Comment on attachment From <email address hidden> 2016-11-29 16:59 EDT-------

Backport for 16.04 -- 0001-dma-mapping-introduce-the-DMA_ATTR_NO_WARN-attribute.patch

------- Comment on attachment From <email address hidden> 2016-11-29 17:00 EDT-------

Backport for 16.04 -- 0002-powerpc-implement-the-DMA_ATTR_NO_WARN-attribute.patch

------- Comment on attachment From <email address hidden> 2016-11-29 17:00 EDT-------

Backport for 16.04 -- 0003-nvme-use-the-DMA_ATTR_NO_WARN-attribute.patch

The verification of the Stable Release Update for linux-lts-xenial has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :
Download full text (22.5 KiB)

This bug was fixed in the package linux - 4.4.0-51.72

---------------
linux (4.4.0-51.72) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1644611

  * 4.4.0-1037-snapdragon #41: kernel panic on boot (LP: #1644596)
    - Revert "dma-mapping: introduce the DMA_ATTR_NO_WARN attribute"
    - Revert "powerpc: implement the DMA_ATTR_NO_WARN attribute"
    - Revert "nvme: use the DMA_ATTR_NO_WARN attribute"

linux (4.4.0-50.71) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1644169

  * xenial 4.4.0-49.70 kernel breaks LXD userspace (LP: #1644165)
    - Revert "UBUNTU: SAUCE: (namespace) fuse: Allow user namespace mounts by
      default"
    - Revert "UBUNTU: SAUCE: (namespace) fs: Don't remove suid for CAP_FSETID for
      userns root"
    - Revert "(namespace) Revert "UBUNTU: SAUCE: fs: Don't remove suid for
      CAP_FSETID in s_user_ns""
    - Revert "UBUNTU: SAUCE: (namespace) fs: Allow superblock owner to change
      ownership of inodes"
    - Revert "(namespace) Revert "UBUNTU: SAUCE: fs: Allow superblock owner to
      change ownership of inodes with unmappable ids""
    - Revert "UBUNTU: SAUCE: (namespace) security/integrity: Harden against
      malformed xattrs"
    - Revert "(namespace) Revert "UBUNTU: SAUCE: ima/evm: Allow root in s_user_ns
      to set xattrs""
    - Revert "(namespace) dquot: For now explicitly don't support filesystems
      outside of init_user_ns"
    - Revert "(namespace) quota: Handle quota data stored in s_user_ns in
      quota_setxquota"
    - Revert "(namespace) quota: Ensure qids map to the filesystem"
    - Revert "(namespace) Revert "UBUNTU: SAUCE: quota: Convert ids relative to
      s_user_ns""
    - Revert "(namespace) Revert "UBUNTU: SAUCE: quota: Require that qids passed
      to dqget() be valid and map into s_user_ns""
    - Revert "(namespace) vfs: Don't create inodes with a uid or gid unknown to
      the vfs"
    - Revert "(namespace) vfs: Don't modify inodes with a uid or gid unknown to
      the vfs"
    - Revert "UBUNTU: SAUCE: (namespace) fuse: Translate ids in posix acl xattrs"
    - Revert "UBUNTU: SAUCE: (namespace) posix_acl: Export
      posix_acl_fix_xattr_userns() to modules"
    - Revert "(namespace) vfs: Verify acls are valid within superblock's
      s_user_ns."
    - Revert "(namespace) Revert "UBUNTU: SAUCE: fs: Update posix_acl support to
      handle user namespace mounts""
    - Revert "(namespace) fs: Refuse uid/gid changes which don't map into
      s_user_ns"
    - Revert "(namespace) Revert "UBUNTU: SAUCE: fs: Refuse uid/gid changes which
      don't map into s_user_ns""
    - Revert "(namespace) mnt: Move the FS_USERNS_MOUNT check into sget_userns"

linux (4.4.0-49.70) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1640921

  * Infiniband driver (kernel module) needed for Azure (LP: #1641139)
    - SAUCE: RDMA Infiniband for Windows Azure
    - [Config] CONFIG_HYPERV_INFINIBAND_ND=m
    - SAUCE: Makefile RDMA infiniband driver for Windows Azure
    - [Config] Add hv_network_direct.ko to generic inclusion list
    - SAUCE: RDMA Infiniband for Windows Azure is dependent on amd64...

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (26.6 KiB)

This bug was fixed in the package linux - 4.8.0-28.30

---------------
linux (4.8.0-28.30) yakkety; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1641083

  * lxc-attach to malicious container allows access to host (LP: #1639345)
    - Revert "UBUNTU: SAUCE: (noup) ptrace: being capable wrt a process requires
      mapped uids/gids"
    - (upstream) mm: Add a user_ns owner to mm_struct and fix ptrace permission
      checks

  * [Feature] AVX-512 new instruction sets (avx512_4vnniw, avx512_4fmaps)
    (LP: #1637526)
    - x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features

  * zfs: importing zpool with vdev on zvol hangs kernel (LP: #1636517)
    - SAUCE: (noup) Update zfs to 0.6.5.8-0ubuntu4.1

  * Move some device drivers build from kernel built-in to modules
    (LP: #1637303)
    - [Config] CONFIG_TIGON3=m for all arches
    - [Config] CONFIG_VIRTIO_BLK=m, CONFIG_VIRTIO_NET=m

  * I2C touchpad does not work on AMD platform (LP: #1612006)
    - pinctrl/amd: Configure GPIO register using BIOS settings

  * guest experiencing Transmit Timeouts on CX4 (LP: #1636330)
    - powerpc/64: Re-fix race condition between going idle and entering guest
    - powerpc/64: Fix race condition in setting lock bit in idle/wakeup code

  * QEMU throws failure msg while booting guest with SRIOV VF (LP: #1630554)
    - KVM: PPC: Always select KVM_VFIO, plus Makefile cleanup

  * [Feature] KBL - New device ID for Kabypoint(KbP) (LP: #1591618)
    - SAUCE: mfd: lpss: Fix Intel Kaby Lake PCH-H properties

  * hio: SSD data corruption under stress test (LP: #1638700)
    - SAUCE: hio: set bi_error field to signal an I/O error on a BIO
    - SAUCE: hio: splitting bio in the entry of .make_request_fn

  * cleanup primary tree for linux-hwe layering issues (LP: #1637473)
    - [Config] switch Vcs-Git: to yakkety repository
    - [Packaging] handle both linux-lts* and linux-hwe* as backports
    - [Config] linux-tools-common and linux-cloud-tools-common are one per series
    - [Config] linux-source-* is in the primary linux namespace
    - [Config] linux-tools -- always suggest the base package

  * SRU: sync zfsutils-linux and spl-linux changes to linux (LP: #1635656)
    - SAUCE: (noup) Update spl to 0.6.5.8-2, zfs to 0.6.5.8-0ubuntu4 (LP:
      #1635656)

  * [Feature] SKX: perf uncore PMU support (LP: #1591810)
    - perf/x86/intel/uncore: Add Skylake server uncore support
    - perf/x86/intel/uncore: Remove hard-coded implementation for Node ID mapping
      location
    - perf/x86/intel/uncore: Handle non-standard counter offset

  * [Feature] Purley: Memory Protection Keys (LP: #1591804)
    - x86/pkeys: Add fault handling for PF_PK page fault bit
    - mm: Implement new pkey_mprotect() system call
    - x86/pkeys: Make mprotect_key() mask off additional vm_flags
    - x86/pkeys: Allocation/free syscalls
    - x86: Wire up protection keys system calls
    - generic syscalls: Wire up memory protection keys syscalls
    - pkeys: Add details of system call use to Documentation/
    - x86/pkeys: Default to a restrictive init PKRU
    - x86/pkeys: Allow configuration of init_pkru
    - x86/pkeys: Add self-tests

  * kernel invalid ...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released

The PATCH v2 series (backports for Xenial) has been submitted to the kernel-team mailing list [1], and ACKed by Tim.

[1] https://lists.ubuntu.com/archives/kernel-team/2016-November/081197.html

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.8.0-30.32

---------------
linux (4.8.0-30.32) yakkety; urgency=low

  * CVE-2016-8655 (LP: #1646318)
    - packet: fix race condition in packet_set_ring

 -- Brad Figg <email address hidden> Thu, 01 Dec 2016 08:02:53 -0800

Changed in linux (Ubuntu):
status: In Progress → Fix Released

------- Comment From <email address hidden> 2016-12-09 07:22 EDT-------
Patches v2 applied to ubuntu-xenial.git master-next branch; so should be included in the next kernel update.

1/3 dma-mapping: introduce the DMA_ATTR_NO_WARN attribute
http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/commit/include/linux/dma-attrs.h?h=master-next&id=ad05f50fcda5a78b2dbc5331d7d8d08fc2b51fac

2/3 powerpc: implement the DMA_ATTR_NO_WARN attribute
http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/commit/arch/powerpc/kernel/iommu.c?h=master-next&id=0cd611da7d4c01b178144bc17da8cd92cae2b1fa

3/3 nvme: use the DMA_ATTR_NO_WARN attribute
http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/commit/drivers/nvme/host/pci.c?h=master-next&id=6901066064fa435b701ae66ee24e66465916ef4b

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers