Comment 28 for bug 1838575

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I modified the kernel to have a few functions non-inlined to be better tracable:
vfio_dma_do_map
vfio_dma_do_unmap
mutex_lock
mutex_unlock
kzalloc
vfio_link_dma
vfio_pin_map_dma
vfio_pin_pages_remote
vfio_iommu_map

Then run tracing on this load with limited to the functions in my focus:
$ sudo trace-cmd record -p function_graph -l vfio_iommu_type1_ioctl -l vfio_pci_ioctl -l vfio_set_irqs_validate_and_prepare -l vfio_pci_set_irqs_ioctl -l pci_probe_reset_slot -l pci_probe_reset_bus -l vfio_pci_count_devs -l vfio_pci_fill_devs -l vfio_pci_validate_devs -l vfio_group_get_external_user -l vfio_group_put_external_user -l vfio_external_user_iommu_id -l vfio_dma_do_map -l vfio_dma_do_unmap -l mutex_lock -l mutex_unlock -l vfio_link_dma -l vfio_pin_map_dma -l vfio_pin_pages_remote -l vfio_iommu_map

This I traced twice to compare it:
- after a clean boot; duration ~12 seconds
- right afterwards (rerun from gdb); duration 175~ seconds

It shows that the locks are interesting for concurrency considerations, but here mostly noisy.
OTOH it is interesting that the slow case has much more of them, maybe they need to unlock for interactiveness in between more often. I'll count them in postprocessing like:

The result shows that the locking involved scaled much more.
Duration is ~*14.5 and locks are ~*34 the average lock activity between vfio calls is even more with
1.52 -> 6.32.

But the calls to vfio_pin_pages_remote / vfio_iommu_map went up as well.
64020 -> 327852; which means smaller chunks I guess.
The timing of these functions changes slightly, initially it is much different than later in the trace.
Lets take a clean shot at just those durations and sizes: