[UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive throughput degradation for PCI-related network workloads
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Fix Committed
|
High
|
Skipper Bug Screeners | |||
linux (Ubuntu) | Status tracked in Oracular | |||||
Noble |
Fix Committed
|
Medium
|
Unassigned | |||
Oracular |
Fix Committed
|
Medium
|
Canonical Kernel Team |
Bug Description
SRU Justification:
[Impact]
* With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer"
(upstream with since kernel v6.7-rc1) there was a move (on s390x only)
to a different dma-iommu implementation.
* And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters"
(again upstream since 6.7(rc-1) the IOMMU_DEFAULT_
option should now be set to 'yes' by default for s390x.
* Since CONFIG_
are related to each other CONFIG_
set to "no" by default, which was upstream done by b2b97a62f055
"Revert "s390: update defconfigs"".
* These changes are all upstream, but were not picked up by the Ubuntu
kernel config.
* And not having these config options set properly is causing significant
PCI-related network throughput degradation (up to -72%).
* This shows for almost all workloads and numbers of connections,
deteriorating with the number of connections increasing.
* Especially drastic is the drop for a high number of parallel connections
(50 and 250) and for small and medium-size transactional workloads.
However, also for streaming-type workloads the degradation is clearly
visible (up to 48% degradation).
[Fix]
* The (upstream accepted) fix is to set
IOMMU_
and
IOMMU_
(which is needed for the changed DAM IOMMU implementation since v6.7).
[Test Case]
* Setup two Ubuntu Server 24.04 systems (with kernel 6.8)
(one acting as server and as client)
that have (PCIe attached) RoCE Express devices attached
and that are connected to each other.
* Verify if the the iommu_group type of the used PCI device is DMA-FQ:
cat /sys/bus/
DMA-FQ
* Sample workload rr1c-200x1000-250 with rr1c-200x1000-
<?xml version="1.0"?>
<profile name="TCP_RR">
<group nprocs="250">
</group>
</profile>
* Install uperf on both systems, client and server.
* Start uperf at server: uperf -s
* Start uperf at client: uperf -vai 5 -m uperf-profile.xml
* Switch from strict to lazy mode
either using the new kernel (or the test build below)
or using kernel cmd-line parameter iommu.strict=0.
* Restart uperf on server and client, like before.
* Verification will be performed by IBM.
[Regression Potential]
* The is a certain regression potential, since the behavior with
the two modified kernel config options will change significantly.
* This may solve the (network) throughput issue with PCI devices,
but may also come with side-effects on other PCIe based devices
(the old compression adapters or the new NVMe carrier cards).
[Other]
* CCW devices are not affected.
* This is s390x-specific only, hence will not affect any other architecture.
__________
Symptom:
Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, all of our PCI-related network measurements on LPAR show massive throughput degradations (up to -72%). This shows for almost all workloads and numbers of connections, detereorating with the number of connections increasing. Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation).
Problem:
With kernel config setting CONFIG_
Behavior can also be changed with a kernel commandline parameter (iommu.strict) for easy verification.
The issue is known and was quickly fixed upstream in December 2023, after being present for little less than two weeks.
Upstream fix: https:/
Repro:
rr1c-200x1000-250 with rr1c-200x1000-
<?xml version="1.0"?>
<profile name="TCP_RR">
<group nprocs="250">
</group>
</profile>
0) Install uperf on both systems, client and server.
1) Start uperf at server: uperf -s
2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml
3) Switch from strict to lazy mode using kernel commandline parameter iommu.strict=0.
4) Repeat steps 1) and 2).
Example:
For the following example, we chose the workload named above (rr1c-200x1000-
iommu.strict=1 (strict): 233464.914 TPS
iommu.strict=0 (lazy): 835123.193 TPS
tags: | added: architecture-s39064 bugnameltc-207082 severity-high targetmilestone-inin--- |
Changed in ubuntu: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
affects: | ubuntu → linux (Ubuntu) |
description: | updated |
Changed in ubuntu-z-systems: | |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
status: | New → In Progress |
description: | updated |
Changed in linux (Ubuntu Noble): | |
importance: | Undecided → Medium |
status: | New → Fix Committed |
Changed in linux (Ubuntu Oracular): | |
importance: | Undecided → Medium |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Noble): | |
status: | Fix Committed → In Progress |
Changed in linux (Ubuntu Noble): | |
status: | In Progress → Fix Committed |
Changed in ubuntu-z-systems: | |
status: | In Progress → Fix Committed |
I just had a look at the Ubuntu kernel noble master-next tree and can find commit: IOMMU_DEFAULT_ DMA_STRICT IOMMU_DEFAULT_ DMA_STRICT option needs to be disabled. IOMMU_DEFAULT_ DMA_STRICT= y IOMMU_DEFAULT_ DMA_STRICT= y
"Revert "s390: update defconfigs"" under the hash b2b97a62f055
and I can see that it got reverted with that:
$ git show b2b97a62f055 | grep CONFIG_
CONFIG_
-CONFIG_
-CONFIG_
But git also tells me that it is in since kernel v6.8 and with that since the first ubuntu 6.8 kernel we had: Ubuntu-6.8.0-6 -- so should also be in Ubuntu-6.8.0-31. IOMMU_DEFAULT_ DMA_STRICT /boot/config- 6.8.0-31- generic IOMMU_DEFAULT_ DMA_STRICT= y IOMMU_DEFAULT_ DMA_STRICT /boot/config- 6.8.0-31- generic IOMMU_DEFAULT_ DMA_STRICT= y
But it does not seem to be reflected in the kernel options of the Ubuntu kernel:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04 LTS
Release: 24.04
Codename: noble
$ uname -a
Linux hwe0008 6.8.0-31-generic #31-Ubuntu SMP Sat Apr 20 00:14:26 UTC 2024 s390x s390x s390x GNU/Linux
$ grep CONFIG_
CONFIG_
also not in the current, updated kernel:
Linux hwe0008 6.8.0-36-generic #36-Ubuntu SMP Mon Jun 10 09:59:13 UTC 2024 s390x s390x s390x GNU/Linux
$ grep CONFIG_
CONFIG_
For some reason the change in the upstream commit was not taken over into the Ubuntu kernel configs ...