By default VFIO disables mapping of MSIX BAR to the userspace as
the userspace may program it in a way allowing spurious interrupts;
instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl.
In order to eliminate guessing from the userspace about what is
mmapable, VFIO also advertises a sparse list of regions allowed to mmap.
This works fine as long as the system page size equals to the MSIX
alignment requirement which is 4KB. However with a bigger page size
the existing code prohibits mapping non-MSIX parts of a page with MSIX
structures so these parts have to be emulated via slow reads/writes on
a VFIO device fd. If these emulated bits are accessed often, this has
serious impact on performance.
This allows mmap of the entire BAR containing MSIX vector table.
This removes the sparse capability for PCI devices as it becomes useless.
As the userspace needs to know for sure whether mmapping of the MSIX
vector containing data can succeed, this adds a new capability -
VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - which explicitly tells the userspace
that the entire BAR can be mmapped.
This does not touch the MSIX mangling in the BAR read/write handlers as
we are doing this just to enable direct access to non MSIX registers.
Signed-off-by: Alexey Kardashevskiy <email address hidden>
[aw - fixup whitespace, trim function name]
Signed-off-by: Alex Williamson <email address hidden>
Agreed on Frank's last comment.
commit a32295c612c5799 0d17fb0f41e7134 394b2f35f6
Author: Alexey Kardashevskiy <email address hidden>
Date: Wed Dec 13 00:31:31 2017
vfio-pci: Allow mapping MSIX BAR
By default VFIO disables mapping of MSIX BAR to the userspace as SET_IRQS ioctl.
the userspace may program it in a way allowing spurious interrupts;
instead the userspace uses the VFIO_DEVICE_
In order to eliminate guessing from the userspace about what is
mmapable, VFIO also advertises a sparse list of regions allowed to mmap.
This works fine as long as the system page size equals to the MSIX
alignment requirement which is 4KB. However with a bigger page size
the existing code prohibits mapping non-MSIX parts of a page with MSIX
structures so these parts have to be emulated via slow reads/writes on
a VFIO device fd. If these emulated bits are accessed often, this has
serious impact on performance.
This allows mmap of the entire BAR containing MSIX vector table.
This removes the sparse capability for PCI devices as it becomes useless.
As the userspace needs to know for sure whether mmapping of the MSIX REGION_ INFO_CAP_ MSIX_MAPPABLE - which explicitly tells the userspace
vector containing data can succeed, this adds a new capability -
VFIO_
that the entire BAR can be mmapped.
This does not touch the MSIX mangling in the BAR read/write handlers as
we are doing this just to enable direct access to non MSIX registers.
Signed-off-by: Alexey Kardashevskiy <email address hidden>
[aw - fixup whitespace, trim function name]
Signed-off-by: Alex Williamson <email address hidden>
Seems a case for HWE kernels.