Support non-strict iommu mode on arm64

Bug #1806488 reported by dann frazier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
dann frazier
Bionic
Fix Released
Undecided
dann frazier
Cosmic
Fix Released
Undecided
dann frazier
Disco
Fix Released
Undecided
dann frazier

Bug Description

[Impact]
The Intel IOMMU driver provides an option for strict mode. When disabled, batching of IOTLB flush operations is permitted, allowing the user to trade-off isolation for improved performance. Ubuntu's kernel currently lacks a parity for this feature for ARM.

There's a significant performance gain to be had by removing the need to flush the IOMMU TLB on every unmap on arm64. I'm seeing a 25% performance gain w/ fio reads on a single NVMe device.

This mode of operation is available for x86 via the "intel_iommu=strict" parameter. Upstream now exposes an equivalent feature for ARM platforms via the "iommu.strict=[0|1]" parameter, while retaining the default strict-enabled mode.

[Test Case]
Run fio with the following config before and after applying the patches and collection IOPS count. Run again after applying the patches. Finally, run a 3rd time after adding iommu.strict=0 to the kernel commandline.

Performance should not regress after the update. Performance should further improve after adding iommu.strict=0 - but if it doesn't for some reason, that is not a regression.

$ cat fio.rc
[global]
rw=read
direct=1
ioengine=libaio
iodepth=2048
numjobs=10
bs=4k
group_reporting=1
group_reporting=1
cpumask=0xff
runtime=100
loops = 10000

[job1]
filename=/dev/nvme0n1

[Fix]
44f6876a00e83 iommu/arm-smmu: Support non-strict mode
b2dfeba654cb0 iommu/io-pgtable-arm-v7s: Add support for non-strict mode
9662b99a19abc iommu/arm-smmu-v3: Add support for non-strict mode
b6b65ca20bc93 iommu/io-pgtable-arm: Add support for non-strict mode
68a6efe86f6a1 iommu: Add "iommu.strict" command line option
2da274cdf998a iommu/dma: Add support for non-strict mode
07fdef34d2be6 iommu/arm-smmu-v3: Implement flush_iotlb_all hook
85c7a0f1ef624 iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()

[Regression Risk]
Most of these patches are specific to ARM, and have been regression tested on both arm64 (HiSilicon D06) and armhf (QEMU virt) using "stress-ng --vm $(nproc)"

2 patches do touch arch-indep code however:

> 68a6efe86f6a1 iommu: Add "iommu.strict" command line option
Adds a new command line option and sets an attribute that iommu drivers can optionally react to. Doesn't change default behavior.

> 2da274cdf998a iommu/dma: Add support for non-strict mode
This driver is only built for arm64 and ppc64el (determined by looking at the build logs). Most of this patch only changes behavior in the non-default (and new) iommu.strict=0 case. The exception, which is called out in the commit message, is this hunk:

- WARN_ON(iommu_unmap(domain, dma_addr, size) != size);
+ WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
+ if (!cookie->fq_domain)
+ iommu_tlb_sync(domain);

In the default case, where fq_domain will be NULl, we are now factoring iommu_unmap() into:
  iommu_unmap_fast()
  iommu_tlb_sync()

Looking at the source to iommu_unmap() confirms that this is functionally equivalent.

dann frazier (dannf)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → dann frazier (dannf)
Changed in linux (Ubuntu Bionic):
assignee: nobody → dann frazier (dannf)
dann frazier (dannf)
description: updated
dann frazier (dannf)
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
dann frazier (dannf)
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Seth Forshee (sforshee)
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
tags: added: verification-needed-bionic
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

dann frazier (dannf)
description: updated
Revision history for this message
Ike Panhc (ikepanhc) wrote :

Performance on NVMe has increased a lot.

-update kernel
Run status group 0 (all jobs):
   READ: bw=1358MiB/s (1424MB/s), 1358MiB/s-1358MiB/s (1424MB/s-1424MB/s), io=133GiB (142GB), run=100009-100009msec

-proposed kernel
Run status group 0 (all jobs):
   READ: bw=1406MiB/s (1475MB/s), 1406MiB/s-1406MiB/s (1475MB/s-1475MB/s), io=137GiB (147GB), run=100005-100005msec

-proposed kernel with iommu.strict=0
Run status group 0 (all jobs):
   READ: bw=1732MiB/s (1816MB/s), 1732MiB/s-1732MiB/s (1816MB/s-1816MB/s), io=169GiB (182GB), run=100004-100004msec

tags: added: verification-done-bionic verification-done-cosmic
removed: verification-needed-bionic verification-needed-cosmic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (47.0 KiB)

This bug was fixed in the package linux - 4.15.0-44.47

---------------
linux (4.15.0-44.47) bionic; urgency=medium

  * linux: 4.15.0-44.47 -proposed tracker (LP: #1811419)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: pass in enum wbt_flags to get_rq_wait()
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: sdhci: Disable 1.8v modes (HS200/HS400/UHS) if controller can't support
      1.8v
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb: Use MMC_CAP2_NO_SDIO
    - mmc: rtsx_usb: Enable MMC_CAP_ERASE to allow erase/discard/trim requests
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant

  * Add Cavium ThunderX2 SoC UNCORE PMU driver (LP: #1811200)
    - perf: Export perf_event_update_userpage
    - Documentation: perf: Add documentation for ThunderX2 PMU uncore driver
    - drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver
    - [Config] New config CONFIG_THUNDERX2_PMU=m

  * Update hisilicon SoC-specific drivers (LP: #1810457)
    - SAUCE: Revert "net: hns3: Updates RX packet info fetch in case of multi BD"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: separate roce from nic when
      resetting"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Use roce handle when calling roce
      callback function"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Add calling roce callback
      function when link status change"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: optimize the process of notifying
      roce client"
    - Revert "UBUNTU: S...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (56.3 KiB)

This bug was fixed in the package linux - 4.18.0-14.15

---------------
linux (4.18.0-14.15) cosmic; urgency=medium

  * linux: 4.18.0-14.15 -proposed tracker (LP: #1811406)

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * [Regression] crashkernel fails on HiSilicon D05 (LP: #1806766)
    - efi: honour memory reservations passed via a linux specific config table
    - efi/arm: libstub: add a root memreserve config table
    - efi: add API to reserve memory persistently across kexec reboot
    - irqchip/gic-v3-its: Change initialization ordering for LPIs
    - irqchip/gic-v3-its: Simplify LPI_PENDBASE_SZ usage
    - irqchip/gic-v3-its: Split property table clearing from allocation
    - irqchip/gic-v3-its: Move pending table allocation to init time
    - irqchip/gic-v3-its: Keep track of property table's PA and VA
    - irqchip/gic-v3-its: Allow use of pre-programmed LPI tables
    - irqchip/gic-v3-its: Use pre-programmed redistributor tables with kdump
      kernels
    - irqchip/gic-v3-its: Check that all RDs have the same property table
    - irqchip/gic-v3-its: Register LPI tables with EFI config table
    - irqchip/gic-v3-its: Allow use of LPI tables in reserved memory
    - arm64: memblock: don't permit memblock resizing until linear mapping is up
    - efi/arm: Defer persistent reservations until after paging_init()
    - efi: Permit calling efi_mem_reserve_persistent() from atomic context
    - efi: Prevent GICv3 WARN() by mapping the memreserve table before first use

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: c...

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.1 KiB)

This bug was fixed in the package linux - 4.19.0-12.13

---------------
linux (4.19.0-12.13) disco; urgency=medium

  * linux: 4.19.0-12.13 -proposed tracker (LP: #1813664)

  * kernel oops in bcache module (LP: #1793901)
    - SAUCE: bcache: never writeback a discard operation

  * Disco update: 4.19.18 upstream stable release (LP: #1813611)
    - ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
    - mlxsw: spectrum: Disable lag port TX before removing it
    - mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
    - net: dsa: mv88x6xxx: mv88e6390 errata
    - net, skbuff: do not prefer skb allocation fails early
    - qmi_wwan: add MTU default to qmap network interface
    - ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
    - net: clear skb->tstamp in bridge forwarding path
    - netfilter: ipset: Allow matching on destination MAC address for mac and
      ipmac sets
    - gpio: pl061: Move irq_chip definition inside struct pl061
    - drm/amd/display: Guard against null stream_state in set_crc_source
    - drm/amdkfd: fix interrupt spin lock
    - ixgbe: allow IPsec Tx offload in VEPA mode
    - platform/x86: asus-wmi: Tell the EC the OS will handle the display off
      hotkey
    - e1000e: allow non-monotonic SYSTIM readings
    - usb: typec: tcpm: Do not disconnect link for self powered devices
    - selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
    - of: overlay: add missing of_node_put() after add new node to changeset
    - writeback: don't decrement wb->refcnt if !wb->bdi
    - serial: set suppress_bind_attrs flag only if builtin
    - bpf: Allow narrow loads with offset > 0
    - ALSA: oxfw: add support for APOGEE duet FireWire
    - x86/mce: Fix -Wmissing-prototypes warnings
    - MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
    - crypto: ecc - regularize scalar for scalar multiplication
    - arm64: perf: set suppress_bind_attrs flag to true
    - drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
    - clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
    - samples: bpf: fix: error handling regarding kprobe_events
    - usb: gadget: udc: renesas_usb3: add a safety connection way for
      forced_b_device
    - fpga: altera-cvp: fix probing for multiple FPGAs on the bus
    - selinux: always allow mounting submounts
    - ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
    - scsi: qedi: Check for session online before getting iSCSI TLV data.
    - drm/amdgpu: Reorder uvd ring init before uvd resume
    - rxe: IB_WR_REG_MR does not capture MR's iova field
    - efi/libstub: Disable some warnings for x86{,_64}
    - jffs2: Fix use of uninitialized delayed_work, lockdep breakage
    - clk: imx: make mux parent strings const
    - pstore/ram: Do not treat empty buffers as valid
    - media: uvcvideo: Refactor teardown of uvc on USB disconnect
    - powerpc/xmon: Fix invocation inside lock region
    - powerpc/pseries/cpuidle: Fix preempt warning
    - media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
    - ASoC: use dma_ops of parent device for acp_audio_dma
    - media: ve...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers