[UBUNTU 21.04] vfio: pass DMA availability information to userspace

Bug #1907421 reported by bugproxy on 2020-12-09
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Undecided
Skipper Bug Screeners
linux (Ubuntu)
Status tracked in Hirsute
Focal
Undecided
Frank Heimes
Groovy
Undecided
Frank Heimes
Hirsute
Undecided
Unassigned

Bug Description

Description: vfio: pass DMA availability information to userspace
Symptom: vfio-pci device on s390 enters error state
Problem: Commit 492855939bdb added a limit to the number of concurrent
               DMA requests for a vfio container. However, lazy unmapping in
               s390 can in fact cause quite a large number of outstanding DMA
               requests to build up prior to being purged, potentially the
               entire guest DMA space. This results in unexpected errors seen
               in qemu such as 'VFIO_MAP_DMA failed: No space left on device'
Solution: The solution requires a change to both kernel and qemu - For
               the kernel, add the ability to provide the number of allowable
               DMA requests via the VFIO_IOMMU_GET_INFO ioctl.
Reproduction: Put a vfio-pci device on s390 under I/O load
Upstream-ID: a717072007e8aedd3f951726d8cf55454860b30d
               7d6e1329652ed971d1b6e0e7bea66fba5044e271

Need also to be integrated into 20.10 and 20.04.

OK, just to clarify we don't need to fix bionic for this one, but rather focal (20.04) and groovy (20.10). Furthermore, for 20.04, 20.10 and 21.04 ONLY commit 7d6e1329652ed971d1b6e0e7bea66fba5044e271 is needed, the other was a pre-req that is already present.
__________

SRU Justification:

[Impact]

* In case a vfio-pci device on s390x is under I/O load, vfio-pci device may end up in error state.

* The commit 492855939bdb added a limit to the number of concurrent DMA requests for a vfio container.

* However, lazy unmapping in s390x can in fact cause quite a large number of outstanding DMA requests
  to build up prior to being purged - potentially the entire guest DMA space.

* This results in unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed: No space left on device'.

* The solution requires a change to both kernel and qemu.

* The kernel side of things is addressed by this SRU.

* The fix adds the ability to provide the number of allowable DMA requests via VFIO_IOMMU_GET_INFO ioctl.

* The actual fix comes with commit 7d6e1329652e, but another fix ccd59dce1a21 is needed to get it cleanly applied.

[Fix]

* ccd59dce1a21f473518bf273bdf5b182bab955b3 ccd59dce1a21 "vfio/type1: Refactor vfio_iommu_type1_ioctl()"

* 7d6e1329652ed971d1b6e0e7bea66fba5044e271 7d6e1329652e "vfio iommu: Add dma available capability"

[Test Case]

* IBM Z or LinuxONE hardware with Ubuntu Server 20.10 installed.

* PCIe adapters in place that provide vfio, like RoCE Express 2.

* A KVM host needs to be setup and a KVM guest (use again 20.10) that uses vfio.

* Generate I/O that flows through the vf and watch out for error like 'VFIO_MAP_DMA failed: No space left on device' in the log.

[Regression Potential]

* The first patch ccd59dce1a21 modifies the common code file drivers/vfio/vfio_iommu_type1.c quity significantly.

* But the reason is not that it introduces a lot of new things, it's a refactoring patch.

* Nevertheless if done in a bad way it can significantly harm the IO memory management of virtual function adapters.

* In worst case it may break them entirely, instead of 'just' exeeding the entire DMA space.

* Things could also go wrong while doing the mapping and unmapping of DMA, that may even have an impact beyond vf adapters - harming other DMA devices.

* The handling of dirty pages is also touched and the ioctl itself - which is important to keep the control of the devices.

* But as said before, it re-factoring work, it's upstream accepted since 5.9 and the provenance shows that many engineers had an eye on these changes.

* The second patch 7d6e1329652e - that inclides the needed fix - comes with far less modifications.

* It also tounches drivers/vfio/vfio_iommu_type1.c but in a way that it mainly add one new function (vfio_iommu_dma_avail_build_caps) and an add. if statement in vfio_iommu_type1_get_info.

* That allows to provide the current allowable number of DMA mappings to the userspace via the IOMMU info chain,
  so that the userspace can take appropriate mitigation.

* Potential problems here can be that the current allowable number of DMA mappings are wrong and in best case just mappings are wasted and in worst case there are more reported than available in reality, which could have a severe impact.

* What happens in such a case is a bit depending on the userspace.

* This patch got upstream accepted with kernel 5.10 and pre-tested by IBM.

* In addition a PPA was created with a patched groovy kernel that was shared for further testing.

[Other]

* The patch got upstream accepted with kernel v5.10, hence it will land in Hirsute once the target kernel 5.10 is in.

* For 5.4 this will come as upstream stable update 5.4.90/91.

* For 5.8 (and 5.9) upstream stable support already ended, hence this SRU is only needed for groovy.

CVE References

bugproxy (bugproxy) on 2020-12-09
tags: added: architecture-s39064 bugnameltc-190211 severity-high targetmilestone-inin2104
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes) wrote :

The patch is in Ubuntu UNSTABLE (that will likely become the kernel for hirsute) as:
7d6e1329652e "vfio iommu: Add dma available capability"
starting with:
Ubuntu-5.10-5.10.0-0.1
Hence updating Hirsute entry to 'In Progress'.

Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu Hirsute):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
Changed in linux (Ubuntu Groovy):
assignee: nobody → Frank Heimes (fheimes)
Changed in linux (Ubuntu Focal):
assignee: nobody → Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
Frank Heimes (fheimes) wrote :

Trying to cherry-pick 7d6e1329652e from focal master-next or groovy master-next ends up with a significant merge conflict.
I guess proper backports are needed and should certainly be addressed via upstream stable, since it's all common code and that would address the maintenance burden for the changes and lower the risk.

Changed in linux (Ubuntu Focal):
status: New → Incomplete
Changed in linux (Ubuntu Groovy):
status: New → Incomplete
Changed in ubuntu-z-systems:
status: Triaged → Incomplete

------- Comment From <email address hidden> 2021-01-19 15:52 EDT-------
The associated commit is now available in 5.4 stable as of 5.4.90 -- the commit ID from the 5.4.y stable branch is 5597557244d44989a7310755d0a8ec40ae603acb 'vfio iommu: Add dma available capability'

Frank Heimes (fheimes) wrote :

Hi Matthew,
I'm updating the hirsute entry to Fix Committed, since the 5.10 kernel is in hirsute-proposed
and the requested patch is included:
Ubuntu-5.10.0-12.13
v5.10
So we are good with 21.04/hirsute.

And it's great that see that you've managed to get it accepted via upstream stable, since this provides a lot of confidence about the fix that we need for an LTS SRU - and since we have a process in place to pick up stable (upstream) updates, that will find (semi-automatically) it's way into 20.04/focal's Ubuntu kernel 5.4 soon.
So this is highly appreciated.

However, to avoid any regression for people who upgrade from an Ubuntu release n to n+1 (in this case 20.04/focal to 20.10/groovy) one requirement of our SRU process is to get the patches in to the Ubuntu releases/kernels starting from newest to the oldest Ubuntu release the patch needs to be applied to.
Hence me trying to get the patch applied to the groovy master-next tree, based on kernel 5.8, which failed (LP comment #2) (since we knew it's already in 5.10 which is the target for 21.04).
Well, I can retry to apply the patch to groovy master-next, since the groovy master-next tree evolved over time, but I guess it still does not apply - hence me asking for a backport (#2)
Btw. even if 5.8.y ran out of upstream stable support, we still look at the 5.9.y stable updates for inclusion into 5.8. So tagging the fix for stable 5.9.y would be an option, too.

Frank Heimes (fheimes) on 2021-01-20
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: Incomplete → Confirmed
Changed in ubuntu-z-systems:
status: Incomplete → Confirmed

Hi,
You also mentioned qemu in the description, I think what is related (by title and it has your name) is the following series + one fixup:

77280d33b s390x: fix build for --without-default-devices
37fa32de70 s390x/pci: Honor DMA limits set by vfio
cd7498d07f s390x/pci: Add routine to get the vfio dma available count
7486a62845 vfio: Find DMA available capability
3ab7a0b40d vfio: Create shared routine for scanning info capabilities
408b55db8b s390x/pci: Move header files to include/hw/s390x
53ba2eee52 linux-headers: update against 5.10-rc1
84567ea763 update-linux-headers: Add vfio_zdev.h

That is in qemu 5.2 - and if that "is it" then in Hirsute things are already fixed by now.
But that is quite a list and for backports we will need to reduce that to the minimum that is needed. Did you check if you can (or have even a branch somewhere) backport these on top of 4.2 and/or 5.0 ?

Also if this shall be an SRU we would need a testcase, is there anything that can be isolated as test instructions?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-01-20 09:51 EDT-------
Re: the kernel patch to groovy... My understanding is that the window for both 5.8.y and 5.9.y are already closed since they are not LTS releases... The 5.10 window is still open and will be for a while as the most recent LTS -- the fix is already in 5.10.

The upstream 5.10 fix is very close to fitting on groovy. The significant difference with groovy is that vfio_iommu_type1_get_info does not yet exist, so an identical change has to be made in vfio_iommu_type1_ioctl instead (vfio_iommu_type1_ioctl got turned into a switch statement that calls sub-functions by ccd59dce1a21, so code got moved - 5.4 backport needed further changes, but you can see the same sort of accommodation there).

I could provide a patch that does this fitting. Or, it looks like upstream ccd59dce1a21 + 7d6e1329652e fits cleanly on groovy.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-01-20 10:02 EDT-------
re: backporting of the qemu patches... I was planning to open a separate bug for this once we determined whether the kernel fix would actually get integrated into focal/groovy.

The primary difference from 5.2 and focal/groovy would be the state of the qemu build system as it went through some recent changes, so there may be some accomodation required to backport. The list you provided is mostly accurate. We can definitely skip:
84567ea763 update-linux-headers: Add vfio_zdev.h
as this is for a different feature that integrated in the same pull request, not part of this bugfix

And we might also be able to skip:
408b55db8b s390x/pci: Move header files to include/hw/s390x
but this would require a minor header change to an include statment.

Testing for the kernel alone is tricky (you need to create a vfio-pci device and then issue the VFIO_IOMMU_GET_INFO ioctl and validate that the DMA limit is included in the response -- I've used a custom QEMU build to do that). Creating a test when both kernel and QEMU changes are available is much easier (but does still require special PCI hardware like a mellanox adapter to pass through to the guest).

Frank Heimes (fheimes) wrote :

Regarding comment #6:
(Didn't noticed that the 5.9 series is also closed now - nevermind ...)
The commit ccd59dce1a21 is indeed the missing puzzle piece.
I was just surprised that just one commit fixes the significant delta,
but since it contains a huge chunk of refactoring work:
"vfio/type1: Refactor vfio_iommu_type1_ioctl()"
this explains it.
I just tried to cherry pick this one (ccd59dce1a21) and the initial one on top (7d6e1329652e) and everything applied cleanly.

With that I can do a manual kernel SRU to groovy, so that we can fix the potential upgrade regression situation.

Thanks for letting me know the missing one ...

And reading comment #7 I think we can (well, actually need) to ask you for the verification tests once the kernel fixes are in, since you have the right hw and a custom qemu.

Changed in linux (Ubuntu Groovy):
status: Incomplete → Confirmed
Frank Heimes (fheimes) wrote :

A PPA was created with a patched groovy/20.10 kernel for further testing:
https://people.canonical.com/~fheimes/lp1907421

Frank Heimes (fheimes) wrote :

Kernel SRU request submitted for groovy:
https://lists.ubuntu.com/archives/kernel-team/2021-January/thread.html#116618
changing status to 'In Progress'.

description: updated
Changed in linux (Ubuntu Groovy):
status: Confirmed → In Progress

> ------- Comment From <email address hidden> 2021-01-20 10:02 EDT-------
> re: backporting of the qemu patches... I was planning to open a separate bug for this once we determined whether the kernel fix would actually get integrated into focal/groovy.

Ok and thanks for the extra details that followed.
In that case I'll ignore this bug here (as it is for now kernel-only)
and you (Or Frank) will let me know when the time for qemu has come.
In advance - if you happened to try and backported the changes needed
to 4.2/5.0 then you can as well point me to them.
I'll then happily review, test and re-use those. I haven't checked
these commits in detail, but sometimes those backports are much easier
for the Author :-)
Waiting until the qemu bug shows up then ...

o/
Christian

------- Comment From <email address hidden> 2021-01-21 17:05 EDT-------
Hi Frank, I applied the groovy PPA above and, using a custom qemu build, verified that the DMA limit is now being properly provided to userspace. Happy to do the same for focal once the fix is there -- Just let me know.

I'll send along a separate bugzilla for the related qemu fix with proposed 4.2/5.0 backports early next week. Thanks!

Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-01-29 03:46 EDT-------
Already sucessfully verified by IBM

Frank Heimes (fheimes) wrote :

Thx, I'm adjusting the tag accordingly ...

tags: added: verification-done-groovy
removed: verification-needed-groovy
Frank Heimes (fheimes) wrote :

The fix is currently prepared for focal based on
LP 1913487 - "Focal update: v5.4.90 upstream stable release"
https://bugs.launchpad.net/bugs/1913487
Hence aligning the status of the focal entry of this bug to LP1913487
and updating the status to In Progress.

Changed in linux (Ubuntu Focal):
status: Confirmed → In Progress
Changed in ubuntu-z-systems:
status: Confirmed → In Progress
Frank Heimes (fheimes) on 2021-02-19
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Frank Heimes (fheimes) wrote :

Now that kernel 5.10 landed in hirsute's release pocket:
linux-generic | 5.10.0.14.16 | hirsute
the 'hirsute' part can be updated to 'Fix Released".

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Frank Heimes (fheimes) wrote :

For groovy the patches are in Ubuntu-5.8.0-44.50

Launchpad Janitor (janitor) wrote :
Download full text (129.8 KiB)

This bug was fixed in the package linux - 5.8.0-44.50

---------------
linux (5.8.0-44.50) groovy; urgency=medium

  * groovy/linux: 5.8.0-44.50 -proposed tracker (LP: #1914805)

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 460-server series and update the 460 series
    (LP: #1913200)
    - [Config] dkms-versions -- drop NVIDIA 435 455 and 440-server
    - [Config] dkms-versions -- add the 460-server nvidia driver

  * [SRU][G/H/U/OEM-5.10] re-enable s0ix of e1000e (LP: #1910541)
    - Revert "UBUNTU: SAUCE: e1000e: bump up timeout to wait when ME un-configure
      ULP mode"
    - e1000e: Only run S0ix flows if shutdown succeeded
    - Revert "e1000e: disable s0ix entry and exit flows for ME systems"
    - e1000e: Export S0ix flags to ethtool

  * suspend only works once on ThinkPad X1 Carbon gen 7 (LP: #1865570) //
    [SRU][G/H/U/OEM-5.10] re-enable s0ix of e1000e (LP: #1910541)
    - e1000e: bump up timeout to wait when ME un-configures ULP mode

  * Cannot probe sata disk on sata controller behind VMD: ata1.00: failed to
    IDENTIFY (I/O error, err_mask=0x4) (LP: #1894778)
    - PCI: vmd: Offset Client VMD MSI-X vectors

  * Enable mute and micmute LED on HP EliteBook 850 G7 (LP: #1910102)
    - ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7

  * SYNA30B4:00 06CB:CE09 Mouse on HP EliteBook 850 G7 not working at all
    (LP: #1908992)
    - HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device

  * HD Audio Device PCI ID for the Intel Cometlake-R platform (LP: #1912427)
    - SAUCE: ALSA: hda: Add Cometlake-R PCI ID

  * switch to an autogenerated nvidia series based core via dkms-versions
    (LP: #1912803)
    - [Packaging] nvidia -- use dkms-versions to define versions built
    - [Packaging] update-version-dkms -- maintain flags fields
    - [Config] dkms-versions -- add transitional/skip information for nvidia
      packages

  * udpgro.sh in net from ubuntu_kernel_selftests seems not reflecting sub-test
    result (LP: #1908499)
    - selftests: fix the return value for UDP GRO test

  * [UBUNTU 21.04] vfio: pass DMA availability information to userspace
    (LP: #1907421)
    - vfio/type1: Refactor vfio_iommu_type1_ioctl()
    - vfio iommu: Add dma available capability

  * qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP
    tx csum offload (LP: #1909062)
    - qede: fix offload for IPIP tunnel packets

  * Use DCPD to control HP DreamColor panel (LP: #1911001)
    - SAUCE: drm/dp: Another HP DreamColor panel brigntness fix

  * Fix right sounds and mute/micmute LEDs for HP ZBook Fury 15/17 G7 Mobile
    Workstation (LP: #1910561)
    - ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machines

  * Ubuntu 20.04 - multicast counter is not increased in ip -s (LP: #1901842)
    - net/mlx5e: Fix multicast counter not up-to-date in "ip -s"

  * eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with 5.4 P8 /
    P9 (LP: #1882503)
    - selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic

  * DMI entry syntax fix for Pegatron /...

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Frank Heimes (fheimes) wrote :

Commit 6507174efc67 "vfio iommu: Add dma available capability" comes with Ubuntu-5.4.0-67.
It's still in proposed as of today.

Frank Heimes (fheimes) wrote :

Patch "vfio iommu: Add dma available capability" now landed as 6507174efc67 in focal-updates and is part of Ubuntu-5.4.0-67.
Hence updating the 'affects focal' entry as Fix Released and with that closing the entire ticket.

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-03-16 03:14 EDT-------
IBM BuzgilL status->closed, Fix Released with all requested distros

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers