Activity log for bug #1913395

Date Who What changed Old value New value Message
2021-01-27 08:39:28 bugproxy bug added bug
2021-01-27 08:39:30 bugproxy tags architecture-s39064 bugnameltc-190223 severity-high targetmilestone-inin---
2021-01-27 08:39:31 bugproxy attachment added 0001 - Backport of 53ba2eee52bf https://bugs.launchpad.net/bugs/1913395/+attachment/5457316/+files/0001-linux-headers-update-against-5.10-rc1.patch
2021-01-27 08:39:33 bugproxy attachment added 0004 - Backport of cd7498d07fbb https://bugs.launchpad.net/bugs/1913395/+attachment/5457317/+files/0004-s390x-pci-Add-routine-to-get-the-vfio-dma-available-.patch
2021-01-27 08:39:34 bugproxy attachment added 0005 - Backport of 37fa32de7073 https://bugs.launchpad.net/bugs/1913395/+attachment/5457318/+files/0005-s390x-pci-Honor-DMA-limits-set-by-vfio.patch
2021-01-27 08:39:36 bugproxy attachment added 0006 - Backport of 77280d33bc9c https://bugs.launchpad.net/bugs/1913395/+attachment/5457319/+files/0006-s390x-fix-build-for-without-default-devices.patch
2021-01-27 08:39:37 bugproxy ubuntu: assignee Skipper Bug Screeners (skipper-screen-team)
2021-01-27 08:39:40 bugproxy affects ubuntu linux (Ubuntu)
2021-01-27 09:34:43 Frank Heimes bug task added qemu (Ubuntu)
2021-01-27 09:35:11 Frank Heimes bug task added ubuntu-z-systems
2021-01-27 09:35:31 Frank Heimes ubuntu-z-systems: assignee Skipper Bug Screeners (skipper-screen-team)
2021-01-27 09:35:37 Frank Heimes linux (Ubuntu): assignee Skipper Bug Screeners (skipper-screen-team)
2021-01-27 09:35:45 Frank Heimes ubuntu-z-systems: importance Undecided High
2021-01-27 09:46:28 Frank Heimes bug task deleted linux (Ubuntu)
2021-01-27 09:46:45 Frank Heimes qemu (Ubuntu): assignee Canonical Server Team (canonical-server)
2021-01-27 09:47:00 Frank Heimes bug added subscriber Christian Ehrhardt 
2021-01-27 09:47:26 Frank Heimes tags architecture-s39064 bugnameltc-190223 severity-high targetmilestone-inin--- architecture-s39064 bugnameltc-190223 qemu-21.04 severity-high targetmilestone-inin---
2021-01-27 11:03:09 Christian Ehrhardt  qemu (Ubuntu): status New Fix Released
2021-01-27 11:03:13 Christian Ehrhardt  nominated for series Ubuntu Groovy
2021-01-27 11:03:13 Christian Ehrhardt  bug task added qemu (Ubuntu Groovy)
2021-01-27 11:03:13 Christian Ehrhardt  nominated for series Ubuntu Focal
2021-01-27 11:03:13 Christian Ehrhardt  bug task added qemu (Ubuntu Focal)
2021-01-27 11:03:19 Christian Ehrhardt  qemu (Ubuntu Focal): status New Triaged
2021-01-27 11:03:21 Christian Ehrhardt  qemu (Ubuntu Groovy): status New Triaged
2021-01-27 11:29:46 Frank Heimes ubuntu-z-systems: status New Triaged
2021-01-27 13:41:49 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/397012
2021-01-27 13:42:13 Launchpad Janitor merge proposal linked https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+merge/397013
2021-01-27 14:02:23 Christian Ehrhardt  description Description: s390x/pci: Honor vfio DMA limiting Symptom: vfio-pci device on s390 enters error state Problem: Kernel commit 492855939bdb added a limit to the number of concurrent DMA requests for a vfio container. However, lazy unmapping in s390 can in fact cause quite a large number of outstanding DMA requests to build up prior to being purged, potentially the entire guest DMA space. This results in unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed: No space left on device' Solution: The solution requires a change to both kernel and qemu - For qemu, add functionality to get the number of allowable DMA DMA requests via the VFIO_IOMMU_GET_INFO ioctl and then ensure that the guest is told to refresh mappings before exceeding the vfio limit. Reproduction: Put a vfio-pci device on s390 under I/O load This QEMU issue is related to the kernel issue in launchpad bug #1907421. Backport patches have been attached for a subset of the required patches for this fix... The backports required boiled down to 3 major reasons: 1) For the header sync, I suspect you only want the minimal set of changes needed 2) There is a missing upstream commit (408b55db8be3) that re-organizes the location of 2 s390-pci header files, causing conflicts 3) Adjustments had to be made due to the QEMU build system change (meson) I initially performed the backport against 4.2/focal-devel; the same patches and process will also apply cleanly to 5.0/groovy-devel. There should be nothing required for hirsute as everything is already in upstream QEMU 5.2. In summary: 53ba2eee52bf: Backport as patch 0001. Rather than doing a full header sync, update ONLY the header change needed for the DMA fix. See attached patch 0001. 3ab7a0b40d4b: cherry-pick works 7486a62845b1: cherry-pick works cd7498d07fbb: Backport as patch 0004. This upstream commit added a new part using meson, which does not exist in 5.0. 37fa32de7073: Backport as patch 0005. This was mainly due to conflicts with a missing patch that relocated some include files. 77280d33bc9c: Backport as patch 0006. This was due to different build system + CONFIG_DEVICES doesn't exist. As such, I have attached patches 0001, 0004, 0005 and 0006. Please cherry pick for patches 0002 and 0003. To verify, I applied the patches provided and cherry-picks against both focal-devel and groovy-devel. In each case, for the host system I used the groovy kernel Frank provided in launchpad bug #1907421 which includes the kernel portion of this fix -- using these together, I verified that the DMA limit is being read in and honored appropriately by QEMU, and I can no longer trigger an overrun of the DMA space when a guest pushes heavy data transfer via PCI (no errors in log, no transfer stalls). Also, as related to the last patch of the set, I further verified that no build errors are encountered when configured with --without-default-devices. [Impact] * In case a vfio-pci device on s390x is under I/O load, vfio-pci device may end up in error state. * However, lazy unmapping in s390x can in fact cause quite a large number of outstanding DMA requests to build up prior to being purged - potentially the entire guest DMA space. * This results in unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed: No space left on device'. * The solution requires a change to both kernel and qemu. * The qemu side of things is addressed by this SRU. [Fix] * A patch series that utilizes the recent kernel additions. It will check the limits and refresh mappings before being exceeded [Test Case] * IBM Z or LinuxONE hardware with Ubuntu Server 20.10 installed. * PCIe adapters in place that provide vfio, like RoCE Express 2. * A KVM host needs to be setup and a KVM guest (use again 20.10) that uses vfio. * Generate I/O that flows through the vf and watch out for error like 'VFIO_MAP_DMA failed: No space left on device' in the log. * We don't have all of that in place, IBM (has done on the related bug as well) will do these tests. [Regression Potential] * This is split in two. - generally the reworks - albeit small - for vfio could affect all platforms so there I'd expect issues - if any - in vfio use-cases like device pass through - on s390x there was more changed, but the regressions we need to look out for would still be in the same "vfio used for pass through" use-case area [Other] * The kernel portion got accepted in bug 1907421 --- Description: s390x/pci: Honor vfio DMA limiting Symptom: vfio-pci device on s390 enters error state Problem: Kernel commit 492855939bdb added a limit to the number of                concurrent DMA requests for a vfio container. However, lazy                unmapping in s390 can in fact cause quite a large number of                outstanding DMA requests to build up prior to being purged,                potentially the entire guest DMA space. This results in                unexpected errors seen in qemu such as 'VFIO_MAP_DMA failed:                No space left on device' Solution: The solution requires a change to both kernel and qemu - For                qemu, add functionality to get the number of allowable DMA                DMA requests via the VFIO_IOMMU_GET_INFO ioctl and then ensure                that the guest is told to refresh mappings before exceeding                the vfio limit. Reproduction: Put a vfio-pci device on s390 under I/O load This QEMU issue is related to the kernel issue in launchpad bug #1907421. Backport patches have been attached for a subset of the required patches for this fix... The backports required boiled down to 3 major reasons: 1) For the header sync, I suspect you only want the minimal set of changes needed 2) There is a missing upstream commit (408b55db8be3) that re-organizes the location of 2 s390-pci header files, causing conflicts 3) Adjustments had to be made due to the QEMU build system change (meson) I initially performed the backport against 4.2/focal-devel; the same patches and process will also apply cleanly to 5.0/groovy-devel. There should be nothing required for hirsute as everything is already in upstream QEMU 5.2. In summary: 53ba2eee52bf: Backport as patch 0001. Rather than doing a full header sync, update ONLY the header change needed for the DMA fix. See attached patch 0001. 3ab7a0b40d4b: cherry-pick works 7486a62845b1: cherry-pick works cd7498d07fbb: Backport as patch 0004. This upstream commit added a new part using meson, which does not exist in 5.0. 37fa32de7073: Backport as patch 0005. This was mainly due to conflicts with a missing patch that relocated some include files. 77280d33bc9c: Backport as patch 0006. This was due to different build system + CONFIG_DEVICES doesn't exist. As such, I have attached patches 0001, 0004, 0005 and 0006. Please cherry pick for patches 0002 and 0003. To verify, I applied the patches provided and cherry-picks against both focal-devel and groovy-devel. In each case, for the host system I used the groovy kernel Frank provided in launchpad bug #1907421 which includes the kernel portion of this fix -- using these together, I verified that the DMA limit is being read in and honored appropriately by QEMU, and I can no longer trigger an overrun of the DMA space when a guest pushes heavy data transfer via PCI (no errors in log, no transfer stalls). Also, as related to the last patch of the set, I further verified that no build errors are encountered when configured with --without-default-devices.
2021-02-03 09:12:33 Christian Ehrhardt  tags architecture-s39064 bugnameltc-190223 qemu-21.04 severity-high targetmilestone-inin--- architecture-s39064 bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin---
2021-02-03 09:12:43 Christian Ehrhardt  bug added subscriber Ubuntu Server
2021-02-09 06:17:13 Christian Ehrhardt  qemu (Ubuntu Focal): status Triaged In Progress
2021-02-09 06:17:15 Christian Ehrhardt  qemu (Ubuntu Groovy): status Triaged In Progress
2021-02-09 06:32:30 Frank Heimes ubuntu-z-systems: status Triaged In Progress
2021-02-10 12:50:13 Christian Ehrhardt  tags architecture-s39064 bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin--- architecture-s39064 block-proposed block-proposed-focal block-proposed-groovy bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin---
2021-02-10 14:56:15 Robie Basak qemu (Ubuntu Groovy): status In Progress Fix Committed
2021-02-10 14:56:16 Robie Basak bug added subscriber Ubuntu Stable Release Updates Team
2021-02-10 14:56:18 Robie Basak bug added subscriber SRU Verification
2021-02-10 14:56:21 Robie Basak tags architecture-s39064 block-proposed block-proposed-focal block-proposed-groovy bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin--- architecture-s39064 block-proposed block-proposed-focal block-proposed-groovy bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin--- verification-needed verification-needed-groovy
2021-02-10 14:56:50 Robie Basak qemu (Ubuntu Focal): status In Progress Fix Committed
2021-02-10 14:56:55 Robie Basak tags architecture-s39064 block-proposed block-proposed-focal block-proposed-groovy bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin--- verification-needed verification-needed-groovy architecture-s39064 block-proposed block-proposed-focal block-proposed-groovy bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin--- verification-needed verification-needed-focal verification-needed-groovy
2021-02-10 17:41:43 Frank Heimes ubuntu-z-systems: status In Progress Fix Committed
2021-02-11 14:50:16 bugproxy tags architecture-s39064 block-proposed block-proposed-focal block-proposed-groovy bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin--- verification-needed verification-needed-focal verification-needed-groovy architecture-s39064 block-proposed block-proposed-focal block-proposed-groovy bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin--- verification-done-focal verification-done-groovy verification-needed
2021-02-12 08:30:25 bugproxy tags architecture-s39064 block-proposed block-proposed-focal block-proposed-groovy bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin--- verification-done-focal verification-done-groovy verification-needed architecture-s39064 block-proposed block-proposed-focal block-proposed-groovy bugnameltc-190223 qemu-21.04 server-next severity-high targetmilestone-inin2104 verification-done-focal verification-done-groovy verification-needed
2021-02-17 00:59:46 Mathew Hodson qemu (Ubuntu): importance Undecided High
2021-02-17 00:59:48 Mathew Hodson qemu (Ubuntu Focal): importance Undecided High
2021-02-17 00:59:51 Mathew Hodson qemu (Ubuntu Groovy): importance Undecided High
2021-02-22 16:03:08 Launchpad Janitor qemu (Ubuntu Groovy): status Fix Committed Fix Released
2021-02-22 16:03:08 Launchpad Janitor cve linked 2020-13754
2021-02-22 16:03:16 Launchpad Janitor qemu (Ubuntu Focal): status Fix Committed Fix Released
2021-02-22 16:24:40 Frank Heimes ubuntu-z-systems: status Fix Committed Fix Released