Support non-strict iommu mode on arm64

Bug #1806488 reported by dann frazier on 2018-12-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Status tracked in Disco
Bionic
Undecided
dann frazier
Cosmic
Undecided
dann frazier
Disco
Undecided
dann frazier

Bug Description

[Impact]
The Intel IOMMU driver provides an option for strict mode. When disabled, batching of IOTLB flush operations is permitted, allowing the user to trade-off isolation for improved performance. Ubuntu's kernel currently lacks a parity for this feature for ARM.

There's a significant performance gain to be had by removing the need to flush the IOMMU TLB on every unmap on arm64. I'm seeing a 25% performance gain w/ fio reads on a single NVMe device.

This mode of operation is available for x86 via the "intel_iommu=strict" parameter. Upstream now exposes an equivalent feature for ARM platforms via the "iommu.strict=[0|1]" parameter, while retaining the default strict-enabled mode.

[Test Case]
Run fio with the following config before and after applying the patches and collection IOPS count. Run again after applying the patches. Finally, run a 3rd time after adding iommu.strict=0 to the kernel commandline.

Performance should not regress after the update. Performance should further improve after adding iommu.strict=0 - but if it doesn't for some reason, that is not a regression.

$ cat fio.rc
[global]
rw=read
direct=1
ioengine=libaio
iodepth=2048
numjobs=10
bs=4k
group_reporting=1
group_reporting=1
cpumask=0xff
runtime=100
loops = 10000

[job1]
filename=/dev/nvme0n1

[Fix]
44f6876a00e83 iommu/arm-smmu: Support non-strict mode
b2dfeba654cb0 iommu/io-pgtable-arm-v7s: Add support for non-strict mode
9662b99a19abc iommu/arm-smmu-v3: Add support for non-strict mode
b6b65ca20bc93 iommu/io-pgtable-arm: Add support for non-strict mode
68a6efe86f6a1 iommu: Add "iommu.strict" command line option
2da274cdf998a iommu/dma: Add support for non-strict mode
07fdef34d2be6 iommu/arm-smmu-v3: Implement flush_iotlb_all hook
85c7a0f1ef624 iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()

[Regression Risk]
Most of these patches are specific to ARM, and have been regression tested on both arm64 (HiSilicon D06) and armhf (QEMU virt) using "stress-ng --vm $(nproc)"

2 patches do touch arch-indep code however:

> 68a6efe86f6a1 iommu: Add "iommu.strict" command line option
Adds a new command line option and sets an attribute that iommu drivers can optionally react to. Doesn't change default behavior.

> 2da274cdf998a iommu/dma: Add support for non-strict mode
This driver is only built for arm64 and ppc64el (determined by looking at the build logs). Most of this patch only changes behavior in the non-default (and new) iommu.strict=0 case. The exception, which is called out in the commit message, is this hunk:

- WARN_ON(iommu_unmap(domain, dma_addr, size) != size);
+ WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
+ if (!cookie->fq_domain)
+ iommu_tlb_sync(domain);

In the default case, where fq_domain will be NULl, we are now factoring iommu_unmap() into:
  iommu_unmap_fast()
  iommu_tlb_sync()

Looking at the source to iommu_unmap() confirms that this is functionally equivalent.

dann frazier (dannf) on 2018-12-14
Changed in linux (Ubuntu Cosmic):
assignee: nobody → dann frazier (dannf)
Changed in linux (Ubuntu Bionic):
assignee: nobody → dann frazier (dannf)
dann frazier (dannf) on 2018-12-19
description: updated
dann frazier (dannf) on 2018-12-19
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
dann frazier (dannf) on 2018-12-21
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Seth Forshee (sforshee) on 2019-01-09
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
tags: added: verification-needed-bionic
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

dann frazier (dannf) on 2019-01-16
description: updated
Ike Panhc (ikepanhc) wrote :

Performance on NVMe has increased a lot.

-update kernel
Run status group 0 (all jobs):
   READ: bw=1358MiB/s (1424MB/s), 1358MiB/s-1358MiB/s (1424MB/s-1424MB/s), io=133GiB (142GB), run=100009-100009msec

-proposed kernel
Run status group 0 (all jobs):
   READ: bw=1406MiB/s (1475MB/s), 1406MiB/s-1406MiB/s (1475MB/s-1475MB/s), io=137GiB (147GB), run=100005-100005msec

-proposed kernel with iommu.strict=0
Run status group 0 (all jobs):
   READ: bw=1732MiB/s (1816MB/s), 1732MiB/s-1732MiB/s (1816MB/s-1816MB/s), io=169GiB (182GB), run=100004-100004msec

tags: added: verification-done-bionic verification-done-cosmic
removed: verification-needed-bionic verification-needed-cosmic
Launchpad Janitor (janitor) wrote :
Download full text (47.0 KiB)

This bug was fixed in the package linux - 4.15.0-44.47

---------------
linux (4.15.0-44.47) bionic; urgency=medium

  * linux: 4.15.0-44.47 -proposed tracker (LP: #1811419)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: pass in enum wbt_flags to get_rq_wait()
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: sdhci: Disable 1.8v modes (HS200/HS400/UHS) if controller can't support
      1.8v
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb: Use MMC_CAP2_NO_SDIO
    - mmc: rtsx_usb: Enable MMC_CAP_ERASE to allow erase/discard/trim requests
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant

  * Add Cavium ThunderX2 SoC UNCORE PMU driver (LP: #1811200)
    - perf: Export perf_event_update_userpage
    - Documentation: perf: Add documentation for ThunderX2 PMU uncore driver
    - drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver
    - [Config] New config CONFIG_THUNDERX2_PMU=m

  * Update hisilicon SoC-specific drivers (LP: #1810457)
    - SAUCE: Revert "net: hns3: Updates RX packet info fetch in case of multi BD"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: separate roce from nic when
      resetting"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Use roce handle when calling roce
      callback function"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Add calling roce callback
      function when link status change"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: optimize the process of notifying
      roce client"
    - Revert "UBUNTU: S...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (56.3 KiB)

This bug was fixed in the package linux - 4.18.0-14.15

---------------
linux (4.18.0-14.15) cosmic; urgency=medium

  * linux: 4.18.0-14.15 -proposed tracker (LP: #1811406)

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * [Regression] crashkernel fails on HiSilicon D05 (LP: #1806766)
    - efi: honour memory reservations passed via a linux specific config table
    - efi/arm: libstub: add a root memreserve config table
    - efi: add API to reserve memory persistently across kexec reboot
    - irqchip/gic-v3-its: Change initialization ordering for LPIs
    - irqchip/gic-v3-its: Simplify LPI_PENDBASE_SZ usage
    - irqchip/gic-v3-its: Split property table clearing from allocation
    - irqchip/gic-v3-its: Move pending table allocation to init time
    - irqchip/gic-v3-its: Keep track of property table's PA and VA
    - irqchip/gic-v3-its: Allow use of pre-programmed LPI tables
    - irqchip/gic-v3-its: Use pre-programmed redistributor tables with kdump
      kernels
    - irqchip/gic-v3-its: Check that all RDs have the same property table
    - irqchip/gic-v3-its: Register LPI tables with EFI config table
    - irqchip/gic-v3-its: Allow use of LPI tables in reserved memory
    - arm64: memblock: don't permit memblock resizing until linear mapping is up
    - efi/arm: Defer persistent reservations until after paging_init()
    - efi: Permit calling efi_mem_reserve_persistent() from atomic context
    - efi: Prevent GICv3 WARN() by mapping the memreserve table before first use

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: c...

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (14.1 KiB)

This bug was fixed in the package linux - 4.19.0-12.13

---------------
linux (4.19.0-12.13) disco; urgency=medium

  * linux: 4.19.0-12.13 -proposed tracker (LP: #1813664)

  * kernel oops in bcache module (LP: #1793901)
    - SAUCE: bcache: never writeback a discard operation

  * Disco update: 4.19.18 upstream stable release (LP: #1813611)
    - ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
    - mlxsw: spectrum: Disable lag port TX before removing it
    - mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
    - net: dsa: mv88x6xxx: mv88e6390 errata
    - net, skbuff: do not prefer skb allocation fails early
    - qmi_wwan: add MTU default to qmap network interface
    - ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
    - net: clear skb->tstamp in bridge forwarding path
    - netfilter: ipset: Allow matching on destination MAC address for mac and
      ipmac sets
    - gpio: pl061: Move irq_chip definition inside struct pl061
    - drm/amd/display: Guard against null stream_state in set_crc_source
    - drm/amdkfd: fix interrupt spin lock
    - ixgbe: allow IPsec Tx offload in VEPA mode
    - platform/x86: asus-wmi: Tell the EC the OS will handle the display off
      hotkey
    - e1000e: allow non-monotonic SYSTIM readings
    - usb: typec: tcpm: Do not disconnect link for self powered devices
    - selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
    - of: overlay: add missing of_node_put() after add new node to changeset
    - writeback: don't decrement wb->refcnt if !wb->bdi
    - serial: set suppress_bind_attrs flag only if builtin
    - bpf: Allow narrow loads with offset > 0
    - ALSA: oxfw: add support for APOGEE duet FireWire
    - x86/mce: Fix -Wmissing-prototypes warnings
    - MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
    - crypto: ecc - regularize scalar for scalar multiplication
    - arm64: perf: set suppress_bind_attrs flag to true
    - drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
    - clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
    - samples: bpf: fix: error handling regarding kprobe_events
    - usb: gadget: udc: renesas_usb3: add a safety connection way for
      forced_b_device
    - fpga: altera-cvp: fix probing for multiple FPGAs on the bus
    - selinux: always allow mounting submounts
    - ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
    - scsi: qedi: Check for session online before getting iSCSI TLV data.
    - drm/amdgpu: Reorder uvd ring init before uvd resume
    - rxe: IB_WR_REG_MR does not capture MR's iova field
    - efi/libstub: Disable some warnings for x86{,_64}
    - jffs2: Fix use of uninitialized delayed_work, lockdep breakage
    - clk: imx: make mux parent strings const
    - pstore/ram: Do not treat empty buffers as valid
    - media: uvcvideo: Refactor teardown of uvc on USB disconnect
    - powerpc/xmon: Fix invocation inside lock region
    - powerpc/pseries/cpuidle: Fix preempt warning
    - media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
    - ASoC: use dma_ops of parent device for acp_audio_dma
    - media: ve...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers