[22.04 FEAT] Enhanced Interpretation for PCI Functions on s390x - kernel part
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Fix Released
|
Medium
|
Skipper Bug Screeners | ||
linux (Ubuntu) |
Fix Released
|
Medium
|
Canonical Kernel Team | ||
Jammy |
Fix Released
|
Undecided
|
Canonical Kernel Team | ||
Kinetic |
Won't Fix
|
Undecided
|
Unassigned | ||
Lunar |
Fix Released
|
Undecided
|
Unassigned | ||
Mantic |
Fix Released
|
Medium
|
Canonical Kernel Team |
Bug Description
[ Impact ]
* Currently the PCI passthrough implementation for s390x is based on
intercepting PCI I/O instructions, which leads to a reduced I/O performance
compared to the execution of PCI instructions directly in LPAR.
* Hence users may face I/O bottlenecks when using PCI devices in passthrough
mode based on the current implementation.
* For avoiding this and to improve performance, the interpretive execution
of the PCI store and PCI load instructions get enabled.
* A further improvement is achieved by enabling the Adapter-
Interpretation (AENI).
* Since LTS releases are the main focus for stable and long running KVM
workloads, it is highly desired to get this backported to the jammy kernel
(and because the next LTS is still some time away).
[ Test Plan ]
* Hardware used: z14 or greater LPAR, PCI-attached devices
(RoCE VFs, ISM devices, NVMe drive)
* Setup: Both the kernel and QEMU features are needed for the feature
to function (an upstream QEMU can be used to verify the kernel early),
and the facility is only avaialble on z14 or newer.
When any of those pieces is missing,
the interpretation facility will not be used.
When both the kernel and QEMU features are included in their respective
packages, and running in an LPAR on a z14 or newer machine,
this feature will be enabled automatically.
Existing supported devices should behave as before with no changes
required by an end-user (e.g. no changes to libvirt domain definitions)
-- but will now make use of the interpretation facility.
Additionally, ISM devices will now be eligible for vfio-pci passthrough
(where before QEMU would exit on error if attempting to provide an ISM
device for vfio-pci passthrough, preventing the guest from starting)
* Testing will include the following scenarios, repeated each for RoCE,
ISM and NVMe:
1) Testing of basic device passthrough (create a VM with a vfio-pci
device as part of the libvirt domain definition, passing through
a RoCE VF, an ISM device, or an NVMe drive. Verify that the device
is available in the guest and functioning)
2) Testing of device hotplug/unplug (create a VM with a vfio-pci device,
virsh detach-device to remove the device from the running guest,
verify the device is removed from the guest, then virsh attach-device
to hotplug the device to the guest again, verify the device functions
in the guest)
3) Host power off testing: Power off the device from the host, verify
that the device is unplugged from the guest as part of the poweroff
4) Guest power off testing: Power off the device from within the guest,
verify that the device is unusuable in the guest,
power the device back on within the guest and verify that the device
is once again usable.
5) Guest reboot testing: (create a VM with a vfio-pci device,
verify the device is in working condition, reboot the guest,
verify that the device is still usable after reboot)
Testing will include the following scenarios specifically for ISM devices:
1) Testing of SMC-D v1 fallback: Using 2 ISM devices on the same VCHID
that share a PNETID, create 2 guests and pass one ISM device
via vfio-pci device to each guest.
Establish TCP connectivity between the 2 guests using the libvirt
default network, and then use smc_run
(https:/
to run an iperf workload between the 2 guests (will include both
short workloads and longer-running workloads).
Verify that SMC-D transfer was used between the guests instead
of TCP via 'smcd stats'
(https:/
2) Testing of SMC-D v2: Same as above,
but using 2 ISM devices on the same VCHID that have no PNETID specified
Testing will include the following scenarios specifically for RoCE devices:
1) Ping testing: Using 2 RoCE VFs that share a common network,
create 2 guests and pass one RoCE device to each guest.
Assign IP addresses within each guest to the associated TCP interface,
perform a ping between the guests to verify connectivity.
2) Iperf testing: Similar to the above, but instead establish an iperf
connection between the 2 guests and verify that the workload
is successful / no errors.
Will include both short workloads and longer-running workloads.
Testing will include the following scenario specifically for NVMe devices:
1) Fio testing: Using a NVMe drive passed to the guest via vfio-pci,
run a series of fio tests against the device from within the guest,
verifying that the workload is successful / no errors.
Will include both short workloads and longer-running workloads.
[ Where problems could occur ]
* The modifications do not change the way users or APIs have to make
use of PCI passthrough, only the internal implementation got modified.
* The vast majority of the code changes/or additional code is s390x-specific,
under arch/s390 and drivers/s390.
* However there is also common code touched:
* 'kvm: use kvfree() in kvm_arch_free_vm()' touches
arch/
arch/
include/
kvfree() allowing to use the common variant, which is upstream since v5.16
and with that well established.
* And 'vfio-pci/zdev: add open/close device hooks' touches
drivers/
include/
It's upstream since kernel 6.0.
* 'KVM: s390: pci: provide routines for en-/disabling interrupt forwarding'
expands a single #if statement in include/
* 'KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices'
adds s390x specific KVM_S390_ZPCI_OP and it's definition to
include/
* And 'vfio-pci/zdev: different maxstbl for interpreted devices' and
'vfio-pci/zdev: add function handle to clp base capability' expand
s390x-specific (aka z-specific aka zdev) device structs in
include/
* This shows that the vast majority of modifications are s390x specific,
even in most of the common code files.
* The remaining modifications in the (generally) common code files are
related to the newly introduced kernel option 'CONFIG_
and documentation.
* The s390x changes are more significant, and could not only harm
passthrough itself for zPCI devices, but also KVM virtualization in general.
* In addition to these kernel changes, qemu modifications are needed
as well (that are addressed at LP#1853307), this modified kernel
must be tested in combination with the updated qemu package.
- The qemu autopkgtest will be a got fit to identify any regressions,
also in the kernel.
- In addition some passthrough related test will be done by IBM
__________
The PCI Passthrough implementation is based on intercepting PCI I/O instructions which leads to a reduced I/O performance compared to execution of PCI instructions in LPAR.
For improved performance the interpretive execution of the PCI store and PCI load instructions get enabled.
Further improvement is achieved by enabling the Adapter-
tags: | added: architecture-s39064 bugnameltc-182254 severity-high targetmilestone-inin2004 |
Changed in ubuntu: | |
assignee: | nobody → Skipper Bug Screeners (skipper-screen-team) |
affects: | ubuntu → linux (Ubuntu) |
summary: |
- [20.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part + [20.10 FEAT] Enhanced Interpretation for PCI Functions - kernel part |
summary: |
- [20.10 FEAT] Enhanced Interpretation for PCI Functions - kernel part + [21.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part |
summary: |
- [21.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part + [21.10 FEAT] Enhanced Interpretation for PCI Functions - kernel part |
summary: |
- [21.10 FEAT] Enhanced Interpretation for PCI Functions - kernel part + [22.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part |
summary: |
- [22.04 FEAT] Enhanced Interpretation for PCI Functions - kernel part + [22.04 FEAT] Enhanced Interpretation for PCI Functions on s390x - kernel + part |
Changed in ubuntu-z-systems: | |
status: | Fix Committed → Fix Released |
Changed in linux (Ubuntu Lunar): | |
status: | New → Fix Released |
Changed in linux (Ubuntu Kinetic): | |
status: | New → Won't Fix |
Changed in linux (Ubuntu Jammy): | |
status: | New → In Progress |
description: | updated |
Changed in linux (Ubuntu Jammy): | |
status: | In Progress → Fix Committed |
description: | updated |
tags: |
added: verification-done-focal-linux-aws-5.15 verification-done-jammy-linux-aws verification-done-jammy-linux-xilinx-zynqmp removed: verification-needed-focal-linux-aws-5.15 verification-needed-jammy-linux-aws verification-needed-jammy-linux-xilinx-zynqmp |
Please specify the planned target kernel this is going to become upstream accepted.
Changing to Incomplete for now.