Activity log for bug #2071471

Date Who What changed Old value New value Message
2024-06-28 13:59:17 bugproxy bug added bug
2024-06-28 13:59:19 bugproxy tags architecture-s39064 bugnameltc-207082 severity-high targetmilestone-inin---
2024-06-28 13:59:19 bugproxy ubuntu: assignee Skipper Bug Screeners (skipper-screen-team)
2024-06-28 13:59:44 bugproxy affects ubuntu linux (Ubuntu)
2024-06-28 14:07:18 Frank Heimes bug task added ubuntu-z-systems
2024-06-28 14:24:51 Frank Heimes ubuntu-z-systems: assignee Skipper Bug Screeners (skipper-screen-team)
2024-06-28 14:24:53 Frank Heimes linux (Ubuntu): assignee Skipper Bug Screeners (skipper-screen-team)
2024-06-28 14:25:01 Frank Heimes ubuntu-z-systems: importance Undecided High
2024-07-03 08:50:50 Frank Heimes description Symptom: Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, all of our PCI-related network measurements on LPAR show massive throughput degradations (up to -72%). This shows for almost all workloads and numbers of connections, detereorating with the number of connections increasing. Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). Problem: With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode changed from lazy to strict, causing these massive degradations. Behavior can also be changed with a kernel commandline parameter (iommu.strict) for easy verification. The issue is known and was quickly fixed upstream in December 2023, after being present for little less than two weeks. Upstream fix: https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a Repro: rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR"> <group nprocs="250"> <transaction iterations="1"> <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> </transaction> <transaction duration="300"> <flowop type="write" options="size=200"/> <flowop type="read" options="size=1000"/> </transaction> <transaction iterations="1"> <flowop type="disconnect" /> </transaction> </group> </profile> 0) Install uperf on both systems, client and server. 1) Start uperf at server: uperf -s 2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml 3) Switch from strict to lazy mode using kernel commandline parameter iommu.strict=0. 4) Repeat steps 1) and 2). Example: For the following example, we chose the workload named above (rr1c-200x1000-250): iommu.strict=1 (strict): 233464.914 TPS iommu.strict=0 (lazy): 835123.193 TPS SRU Justification: [Impact] * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer" (upstream with since kernel v6.7-rc1) there was a move (on s390x only) to a different dma-iommu implementation. * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters" (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config option should now be set to 'yes' by default for s390x. * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be set to "no" by default, which was upstream done by b2b97a62f055 "Revert "s390: update defconfigs"". * These changes are all upstream, but were not picked up by the Ubuntu kernel config. * And not having these config options set properly is causing significant PCI-related network throughput degradation (up to -72%). * This shows for almost all workloads and numbers of connections, deteriorating with the number of connections increasing. * Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). [Fix] * The (upstream) fix is to set IOMMU_DEFAULT_DMA_STRICT=no and IOMMU_DEFAULT_DMA_LAZY=y (which is needed for the changed DAM IOMMU implementation since v6.7). [Test Case] * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8) (one acting as server and as client) that have (PCIe attached) RoCE Express devices attached and that are connected to each other. * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR"> <group nprocs="250"> <transaction iterations="1"> <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> </transaction> <transaction duration="300"> <flowop type="write" options="size=200"/> <flowop type="read" options="size=1000"/> </transaction> <transaction iterations="1"> <flowop type="disconnect" /> </transaction> </group> </profile> * Install uperf on both systems, client and server. * Start uperf at server: uperf -s * Start uperf at client: uperf -vai 5 -m uperf-profile.xml * Switch from strict to lazy mode either using the new kernel (or the test build below) or using kernel cmd-line parameter iommu.strict=0. * Restart uperf on server and client, like before. * Verification will be performed by IBM. [Regression Potential] * The is a certain regression potential, since the behavior with the two modified kernel config options will change significantly. * This may solve the (network) throughput issue with PCI devices, but may also come with side-effects on other PCIe based devices (the old compression adapters or the new NVMe carrier cards). [Other] * CCW devices are not affected. * This is s390x-specific only, hence will not affect any other architecture. __________ Symptom: Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, all of our PCI-related network measurements on LPAR show massive throughput degradations (up to -72%). This shows for almost all workloads and numbers of connections, detereorating with the number of connections increasing. Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). Problem: With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode changed from lazy to strict, causing these massive degradations. Behavior can also be changed with a kernel commandline parameter (iommu.strict) for easy verification. The issue is known and was quickly fixed upstream in December 2023, after being present for little less than two weeks. Upstream fix: https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a Repro: rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR">         <group nprocs="250">                 <transaction iterations="1">                         <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />                 </transaction>                 <transaction duration="300">                         <flowop type="write" options="size=200"/>                         <flowop type="read" options="size=1000"/>                 </transaction>                 <transaction iterations="1">                         <flowop type="disconnect" />                 </transaction>         </group> </profile> 0) Install uperf on both systems, client and server. 1) Start uperf at server: uperf -s 2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml 3) Switch from strict to lazy mode using kernel commandline parameter iommu.strict=0. 4) Repeat steps 1) and 2). Example: For the following example, we chose the workload named above (rr1c-200x1000-250): iommu.strict=1 (strict): 233464.914 TPS iommu.strict=0 (lazy): 835123.193 TPS
2024-07-03 08:50:58 Frank Heimes ubuntu-z-systems: status New In Progress
2024-07-03 08:51:03 Frank Heimes linux (Ubuntu): status New In Progress
2024-07-03 08:56:16 Frank Heimes description SRU Justification: [Impact] * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer" (upstream with since kernel v6.7-rc1) there was a move (on s390x only) to a different dma-iommu implementation. * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters" (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config option should now be set to 'yes' by default for s390x. * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be set to "no" by default, which was upstream done by b2b97a62f055 "Revert "s390: update defconfigs"". * These changes are all upstream, but were not picked up by the Ubuntu kernel config. * And not having these config options set properly is causing significant PCI-related network throughput degradation (up to -72%). * This shows for almost all workloads and numbers of connections, deteriorating with the number of connections increasing. * Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). [Fix] * The (upstream) fix is to set IOMMU_DEFAULT_DMA_STRICT=no and IOMMU_DEFAULT_DMA_LAZY=y (which is needed for the changed DAM IOMMU implementation since v6.7). [Test Case] * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8) (one acting as server and as client) that have (PCIe attached) RoCE Express devices attached and that are connected to each other. * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR"> <group nprocs="250"> <transaction iterations="1"> <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> </transaction> <transaction duration="300"> <flowop type="write" options="size=200"/> <flowop type="read" options="size=1000"/> </transaction> <transaction iterations="1"> <flowop type="disconnect" /> </transaction> </group> </profile> * Install uperf on both systems, client and server. * Start uperf at server: uperf -s * Start uperf at client: uperf -vai 5 -m uperf-profile.xml * Switch from strict to lazy mode either using the new kernel (or the test build below) or using kernel cmd-line parameter iommu.strict=0. * Restart uperf on server and client, like before. * Verification will be performed by IBM. [Regression Potential] * The is a certain regression potential, since the behavior with the two modified kernel config options will change significantly. * This may solve the (network) throughput issue with PCI devices, but may also come with side-effects on other PCIe based devices (the old compression adapters or the new NVMe carrier cards). [Other] * CCW devices are not affected. * This is s390x-specific only, hence will not affect any other architecture. __________ Symptom: Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, all of our PCI-related network measurements on LPAR show massive throughput degradations (up to -72%). This shows for almost all workloads and numbers of connections, detereorating with the number of connections increasing. Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). Problem: With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode changed from lazy to strict, causing these massive degradations. Behavior can also be changed with a kernel commandline parameter (iommu.strict) for easy verification. The issue is known and was quickly fixed upstream in December 2023, after being present for little less than two weeks. Upstream fix: https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a Repro: rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR">         <group nprocs="250">                 <transaction iterations="1">                         <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />                 </transaction>                 <transaction duration="300">                         <flowop type="write" options="size=200"/>                         <flowop type="read" options="size=1000"/>                 </transaction>                 <transaction iterations="1">                         <flowop type="disconnect" />                 </transaction>         </group> </profile> 0) Install uperf on both systems, client and server. 1) Start uperf at server: uperf -s 2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml 3) Switch from strict to lazy mode using kernel commandline parameter iommu.strict=0. 4) Repeat steps 1) and 2). Example: For the following example, we chose the workload named above (rr1c-200x1000-250): iommu.strict=1 (strict): 233464.914 TPS iommu.strict=0 (lazy): 835123.193 TPS SRU Justification: [Impact]  * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer"    (upstream with since kernel v6.7-rc1) there was a move (on s390x only)    to a different dma-iommu implementation.  * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters"    (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config    option should now be set to 'yes' by default for s390x.  * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY    are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be    set to "no" by default, which was upstream done by b2b97a62f055    "Revert "s390: update defconfigs"".  * These changes are all upstream, but were not picked up by the Ubuntu    kernel config.  * And not having these config options set properly is causing significant    PCI-related network throughput degradation (up to -72%).  * This shows for almost all workloads and numbers of connections,    deteriorating with the number of connections increasing.  * Especially drastic is the drop for a high number of parallel connections    (50 and 250) and for small and medium-size transactional workloads.    However, also for streaming-type workloads the degradation is clearly    visible (up to 48% degradation). [Fix]  * The (upstream accepted) fix is to set    IOMMU_DEFAULT_DMA_STRICT=no    and    IOMMU_DEFAULT_DMA_LAZY=y    (which is needed for the changed DAM IOMMU implementation since v6.7). [Test Case]  * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8)    (one acting as server and as client)    that have (PCIe attached) RoCE Express devices attached    and that are connected to each other.  * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml:    <?xml version="1.0"?>    <profile name="TCP_RR">            <group nprocs="250">                    <transaction iterations="1">                            <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />                    </transaction>                    <transaction duration="300">                            <flowop type="write" options="size=200"/>                            <flowop type="read" options="size=1000"/>                    </transaction>                    <transaction iterations="1">                            <flowop type="disconnect" />                    </transaction>            </group>    </profile>  * Install uperf on both systems, client and server.  * Start uperf at server: uperf -s  * Start uperf at client: uperf -vai 5 -m uperf-profile.xml  * Switch from strict to lazy mode    either using the new kernel (or the test build below)    or using kernel cmd-line parameter iommu.strict=0.  * Restart uperf on server and client, like before.  * Verification will be performed by IBM. [Regression Potential]  * The is a certain regression potential, since the behavior with    the two modified kernel config options will change significantly.  * This may solve the (network) throughput issue with PCI devices,    but may also come with side-effects on other PCIe based devices    (the old compression adapters or the new NVMe carrier cards). [Other]  * CCW devices are not affected.  * This is s390x-specific only, hence will not affect any other architecture. __________ Symptom: Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, all of our PCI-related network measurements on LPAR show massive throughput degradations (up to -72%). This shows for almost all workloads and numbers of connections, detereorating with the number of connections increasing. Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). Problem: With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode changed from lazy to strict, causing these massive degradations. Behavior can also be changed with a kernel commandline parameter (iommu.strict) for easy verification. The issue is known and was quickly fixed upstream in December 2023, after being present for little less than two weeks. Upstream fix: https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a Repro: rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR">         <group nprocs="250">                 <transaction iterations="1">                         <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />                 </transaction>                 <transaction duration="300">                         <flowop type="write" options="size=200"/>                         <flowop type="read" options="size=1000"/>                 </transaction>                 <transaction iterations="1">                         <flowop type="disconnect" />                 </transaction>         </group> </profile> 0) Install uperf on both systems, client and server. 1) Start uperf at server: uperf -s 2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml 3) Switch from strict to lazy mode using kernel commandline parameter iommu.strict=0. 4) Repeat steps 1) and 2). Example: For the following example, we chose the workload named above (rr1c-200x1000-250): iommu.strict=1 (strict): 233464.914 TPS iommu.strict=0 (lazy): 835123.193 TPS
2024-07-03 09:22:00 Frank Heimes linux (Ubuntu): assignee Canonical Kernel Team (canonical-kernel-team)
2024-07-03 11:16:34 Frank Heimes description SRU Justification: [Impact]  * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer"    (upstream with since kernel v6.7-rc1) there was a move (on s390x only)    to a different dma-iommu implementation.  * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters"    (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config    option should now be set to 'yes' by default for s390x.  * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY    are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be    set to "no" by default, which was upstream done by b2b97a62f055    "Revert "s390: update defconfigs"".  * These changes are all upstream, but were not picked up by the Ubuntu    kernel config.  * And not having these config options set properly is causing significant    PCI-related network throughput degradation (up to -72%).  * This shows for almost all workloads and numbers of connections,    deteriorating with the number of connections increasing.  * Especially drastic is the drop for a high number of parallel connections    (50 and 250) and for small and medium-size transactional workloads.    However, also for streaming-type workloads the degradation is clearly    visible (up to 48% degradation). [Fix]  * The (upstream accepted) fix is to set    IOMMU_DEFAULT_DMA_STRICT=no    and    IOMMU_DEFAULT_DMA_LAZY=y    (which is needed for the changed DAM IOMMU implementation since v6.7). [Test Case]  * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8)    (one acting as server and as client)    that have (PCIe attached) RoCE Express devices attached    and that are connected to each other.  * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml:    <?xml version="1.0"?>    <profile name="TCP_RR">            <group nprocs="250">                    <transaction iterations="1">                            <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />                    </transaction>                    <transaction duration="300">                            <flowop type="write" options="size=200"/>                            <flowop type="read" options="size=1000"/>                    </transaction>                    <transaction iterations="1">                            <flowop type="disconnect" />                    </transaction>            </group>    </profile>  * Install uperf on both systems, client and server.  * Start uperf at server: uperf -s  * Start uperf at client: uperf -vai 5 -m uperf-profile.xml  * Switch from strict to lazy mode    either using the new kernel (or the test build below)    or using kernel cmd-line parameter iommu.strict=0.  * Restart uperf on server and client, like before.  * Verification will be performed by IBM. [Regression Potential]  * The is a certain regression potential, since the behavior with    the two modified kernel config options will change significantly.  * This may solve the (network) throughput issue with PCI devices,    but may also come with side-effects on other PCIe based devices    (the old compression adapters or the new NVMe carrier cards). [Other]  * CCW devices are not affected.  * This is s390x-specific only, hence will not affect any other architecture. __________ Symptom: Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, all of our PCI-related network measurements on LPAR show massive throughput degradations (up to -72%). This shows for almost all workloads and numbers of connections, detereorating with the number of connections increasing. Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). Problem: With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode changed from lazy to strict, causing these massive degradations. Behavior can also be changed with a kernel commandline parameter (iommu.strict) for easy verification. The issue is known and was quickly fixed upstream in December 2023, after being present for little less than two weeks. Upstream fix: https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a Repro: rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR">         <group nprocs="250">                 <transaction iterations="1">                         <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />                 </transaction>                 <transaction duration="300">                         <flowop type="write" options="size=200"/>                         <flowop type="read" options="size=1000"/>                 </transaction>                 <transaction iterations="1">                         <flowop type="disconnect" />                 </transaction>         </group> </profile> 0) Install uperf on both systems, client and server. 1) Start uperf at server: uperf -s 2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml 3) Switch from strict to lazy mode using kernel commandline parameter iommu.strict=0. 4) Repeat steps 1) and 2). Example: For the following example, we chose the workload named above (rr1c-200x1000-250): iommu.strict=1 (strict): 233464.914 TPS iommu.strict=0 (lazy): 835123.193 TPS SRU Justification: [Impact]  * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer"    (upstream with since kernel v6.7-rc1) there was a move (on s390x only)    to a different dma-iommu implementation.  * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters"    (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config    option should now be set to 'yes' by default for s390x.  * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY    are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be    set to "no" by default, which was upstream done by b2b97a62f055    "Revert "s390: update defconfigs"".  * These changes are all upstream, but were not picked up by the Ubuntu    kernel config.  * And not having these config options set properly is causing significant    PCI-related network throughput degradation (up to -72%).  * This shows for almost all workloads and numbers of connections,    deteriorating with the number of connections increasing.  * Especially drastic is the drop for a high number of parallel connections    (50 and 250) and for small and medium-size transactional workloads.    However, also for streaming-type workloads the degradation is clearly    visible (up to 48% degradation). [Fix]  * The (upstream accepted) fix is to set    IOMMU_DEFAULT_DMA_STRICT=no    and    IOMMU_DEFAULT_DMA_LAZY=y    (which is needed for the changed DAM IOMMU implementation since v6.7). [Test Case]  * Setup two Ubuntu Server 24.04 systems (with kernel 6.8)    (one acting as server and as client)    that have (PCIe attached) RoCE Express devices attached    and that are connected to each other. * Verify if the the iommu_group type of the used PCI device is DMA-FQ: cat /sys/bus/pci/devices/<device>\:00\:00.0/iommu_group/type DMA-FQ  * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml:    <?xml version="1.0"?>    <profile name="TCP_RR">            <group nprocs="250">                    <transaction iterations="1">                            <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />                    </transaction>                    <transaction duration="300">                            <flowop type="write" options="size=200"/>                            <flowop type="read" options="size=1000"/>                    </transaction>                    <transaction iterations="1">                            <flowop type="disconnect" />                    </transaction>            </group>    </profile>  * Install uperf on both systems, client and server.  * Start uperf at server: uperf -s  * Start uperf at client: uperf -vai 5 -m uperf-profile.xml  * Switch from strict to lazy mode    either using the new kernel (or the test build below)    or using kernel cmd-line parameter iommu.strict=0.  * Restart uperf on server and client, like before.  * Verification will be performed by IBM. [Regression Potential]  * The is a certain regression potential, since the behavior with    the two modified kernel config options will change significantly.  * This may solve the (network) throughput issue with PCI devices,    but may also come with side-effects on other PCIe based devices    (the old compression adapters or the new NVMe carrier cards). [Other]  * CCW devices are not affected.  * This is s390x-specific only, hence will not affect any other architecture. __________ Symptom: Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, all of our PCI-related network measurements on LPAR show massive throughput degradations (up to -72%). This shows for almost all workloads and numbers of connections, detereorating with the number of connections increasing. Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). Problem: With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode changed from lazy to strict, causing these massive degradations. Behavior can also be changed with a kernel commandline parameter (iommu.strict) for easy verification. The issue is known and was quickly fixed upstream in December 2023, after being present for little less than two weeks. Upstream fix: https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a Repro: rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR">         <group nprocs="250">                 <transaction iterations="1">                         <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" />                 </transaction>                 <transaction duration="300">                         <flowop type="write" options="size=200"/>                         <flowop type="read" options="size=1000"/>                 </transaction>                 <transaction iterations="1">                         <flowop type="disconnect" />                 </transaction>         </group> </profile> 0) Install uperf on both systems, client and server. 1) Start uperf at server: uperf -s 2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml 3) Switch from strict to lazy mode using kernel commandline parameter iommu.strict=0. 4) Repeat steps 1) and 2). Example: For the following example, we chose the workload named above (rr1c-200x1000-250): iommu.strict=1 (strict): 233464.914 TPS iommu.strict=0 (lazy): 835123.193 TPS
2024-07-04 16:23:41 Stefan Bader nominated for series Ubuntu Noble
2024-07-04 16:23:41 Stefan Bader bug task added linux (Ubuntu Noble)
2024-07-04 16:23:41 Stefan Bader nominated for series Ubuntu Oracular
2024-07-04 16:23:41 Stefan Bader bug task added linux (Ubuntu Oracular)
2024-07-04 16:23:59 Stefan Bader linux (Ubuntu Noble): importance Undecided Medium
2024-07-04 16:23:59 Stefan Bader linux (Ubuntu Noble): status New Fix Committed
2024-07-04 16:24:15 Stefan Bader linux (Ubuntu Oracular): importance Undecided Medium
2024-07-04 16:24:28 Stefan Bader linux (Ubuntu Oracular): status In Progress Fix Committed
2024-07-04 16:24:33 Stefan Bader linux (Ubuntu Noble): status Fix Committed In Progress
2024-07-04 18:57:40 Stefan Bader linux (Ubuntu Noble): status In Progress Fix Committed
2024-07-04 19:29:16 Frank Heimes ubuntu-z-systems: status In Progress Fix Committed
2024-07-11 19:45:43 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-207082 severity-high targetmilestone-inin--- architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin--- verification-needed-noble-linux
2024-07-19 07:29:30 bugproxy tags architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin--- verification-needed-noble-linux architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done verification-done-noble verification-needed-noble-linux
2024-08-02 15:32:35 Timo Aaltonen tags architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done verification-done-noble verification-needed-noble-linux architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done verification-done-noble verification-done-noble-linux
2024-08-06 20:56:20 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done verification-done-noble verification-done-noble-linux architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-noble-linux-ibm-gt-tdx
2024-08-08 09:47:07 Launchpad Janitor linux (Ubuntu Noble): status Fix Committed Fix Released
2024-08-08 09:47:07 Launchpad Janitor cve linked 2024-25742
2024-08-08 09:47:07 Launchpad Janitor cve linked 2024-35984
2024-08-08 09:47:07 Launchpad Janitor cve linked 2024-35990
2024-08-08 09:47:07 Launchpad Janitor cve linked 2024-35992
2024-08-08 09:47:07 Launchpad Janitor cve linked 2024-35997
2024-08-08 09:47:07 Launchpad Janitor cve linked 2024-36008
2024-08-08 09:47:07 Launchpad Janitor cve linked 2024-36016
2024-08-08 10:12:01 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-noble-linux-ibm-gt-tdx architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx
2024-08-08 10:14:59 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-raspi
2024-08-08 10:18:30 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-207082 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-raspi architecture-s39064 bugnameltc-207082 kernel-spammed-jammy-linux-riscv-6.8-v2 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-jammy-linux-riscv-6.8 verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-raspi
2024-08-08 11:15:17 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-207082 kernel-spammed-jammy-linux-riscv-6.8-v2 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-jammy-linux-riscv-6.8 verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-raspi architecture-s39064 bugnameltc-207082 kernel-spammed-jammy-linux-riscv-6.8-v2 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-raspi-realtime-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-jammy-linux-riscv-6.8 verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-raspi verification-needed-noble-linux-raspi-realtime
2024-08-08 12:45:07 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-207082 kernel-spammed-jammy-linux-riscv-6.8-v2 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-raspi-realtime-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-jammy-linux-riscv-6.8 verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-raspi verification-needed-noble-linux-raspi-realtime architecture-s39064 bugnameltc-207082 kernel-spammed-jammy-linux-riscv-6.8-v2 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-lowlatency-v2 kernel-spammed-noble-linux-raspi-realtime-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-jammy-linux-riscv-6.8 verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-lowlatency verification-needed-noble-linux-raspi verification-needed-noble-linux-raspi-realtime
2024-08-09 03:27:31 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-207082 kernel-spammed-jammy-linux-riscv-6.8-v2 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-lowlatency-v2 kernel-spammed-noble-linux-raspi-realtime-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-jammy-linux-riscv-6.8 verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-lowlatency verification-needed-noble-linux-raspi verification-needed-noble-linux-raspi-realtime architecture-s39064 bugnameltc-207082 kernel-spammed-jammy-linux-ibm-6.8-v2 kernel-spammed-jammy-linux-riscv-6.8-v2 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-lowlatency-v2 kernel-spammed-noble-linux-raspi-realtime-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-jammy-linux-ibm-6.8 verification-needed-jammy-linux-riscv-6.8 verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-lowlatency verification-needed-noble-linux-raspi verification-needed-noble-linux-raspi-realtime
2024-08-12 11:36:32 Ubuntu Kernel Bot tags architecture-s39064 bugnameltc-207082 kernel-spammed-jammy-linux-ibm-6.8-v2 kernel-spammed-jammy-linux-riscv-6.8-v2 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-lowlatency-v2 kernel-spammed-noble-linux-raspi-realtime-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-jammy-linux-ibm-6.8 verification-needed-jammy-linux-riscv-6.8 verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-lowlatency verification-needed-noble-linux-raspi verification-needed-noble-linux-raspi-realtime architecture-s39064 bugnameltc-207082 kernel-spammed-jammy-linux-ibm-6.8-v2 kernel-spammed-jammy-linux-lowlatency-hwe-6.8-v2 kernel-spammed-jammy-linux-riscv-6.8-v2 kernel-spammed-noble-linux-azure-v2 kernel-spammed-noble-linux-ibm-gt-tdx-v2 kernel-spammed-noble-linux-lowlatency-v2 kernel-spammed-noble-linux-raspi-realtime-v2 kernel-spammed-noble-linux-raspi-v2 kernel-spammed-noble-linux-v2 severity-high targetmilestone-inin2404 verification-done-noble verification-done-noble-linux verification-needed-jammy-linux-ibm-6.8 verification-needed-jammy-linux-lowlatency-hwe-6.8 verification-needed-jammy-linux-riscv-6.8 verification-needed-noble-linux-azure verification-needed-noble-linux-ibm-gt-tdx verification-needed-noble-linux-lowlatency verification-needed-noble-linux-raspi verification-needed-noble-linux-raspi-realtime