This is repeatable and explains why I haven't seen exactly the same that I saw on x86.
VFIO_IOMMU_SPAPR_REGISTER_MEMORY is ppc specific (but with the same long hang behavior).
Already there at least for ppc the documentation (kernel) says:
460 - VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls
461 receive a user space address and size of the block to be pinned.
462 Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
463 be called with the exact address and size used for registering
464 the memory block. The userspace is not expected to call these often.
465 The ranges are stored in a linked list in a VFIO container.
The size seems to be all memory as I see:
reg.vaddr = (uintptr_t) vfio_prereg_gpa_to_vaddr(section, gpa);
reg.size = end - gpa;
And GDB confirms that is ALL of the guests memory (which explains the scaling with memory size)
78 ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_REGISTER_MEMORY, ®);
(gdb) p reg.size/1024/1024
$3 = 131072
"non bisectable" is the bad flag here.
It might be splittable in the kernel, but for this qemu can't do a lot as it has to be a single range.
Now that we have this confirmed, lets search the same on x86.
I built qemu head from git
$ export CFLAGS="-O0 -g" linux-user --disable-docs --disable- guest-agent --disable-sdl --disable-gtk --disable-vnc --disable-xen --disable-brlapi --enable-fdt --disable-bluez --disable-vde --disable-rbd --disable-libiscsi --disable-libnfs --disable-libusb --disable-usb-redir --disable-seccomp --disable-glusterfs --disable-tpm --disable-numa --disable-slirp --disable-blobs --target- list=ppc64- softmmu
$ ./configure --disable-user --disable-
$ make -j
$ virsh nodedev-detach pci_0005_01_00_0 --driver vfio
$ virsh nodedev-detach pci_0005_01_00_1 --driver vfio
$ virsh nodedev-detach pci_0005_01_00_2 --driver vfio
$ virsh nodedev-detach pci_0005_01_00_3 --driver vfio
$ virsh nodedev-detach pci_0005_01_00_4 --driver vfio
$ virsh nodedev-detach pci_0005_01_00_5 --driver vfio
$ sudo ppc64-softmmu/ qemu-system- ppc64 -machine pseries- 4.1,accel= kvm,usb= off,dump- guest-core= off,cap- cfpc=broken, cap-sbbc= broken, cap-ibs= broken -name guest=test- vfio-slowness -m 131072 -smp 1 -no-user-config -device spapr-pci- host-bridge, index=1, id=pci. 1 -drive file=/var/ lib/uvtool/ libvirt/ images/ test-huge- mem-init. qcow,format= qcow2,if= none,id= drive-virtio- disk0 -device virtio- blk-pci, scsi=off, bus=pci. 0,addr= 0x3,drive= drive-virtio- disk0,id= virtio- disk0,bootindex =1 -device vfio-pci, host=0005: 01:00.0, id=hostdev0, bus=pci. 1.0,addr= 0x1 -device vfio-pci, host=0005: 01:00.1, id=hostdev1, bus=pci. 1.0,addr= 0x2 -device vfio-pci, host=0005: 01:00.2, id=hostdev2, bus=pci. 1.0,addr= 0x3 -device vfio-pci, host=0005: 01:00.3, id=hostdev3, bus=pci. 1.0,addr= 0x4 -device vfio-pci, host=0005: 01:00.4, id=hostdev4, bus=pci. 1.0,addr= 0x5 -device vfio-pci, host=0005: 01:00.5, id=hostdev5, bus=pci. 1.0,addr= 0x6 -msg timestamp=on -display curses
I found VFIO_IOMMU_ SPAPR_REGISTER_ MEMORY: "/sys/bus/ pci/devices/ 0005:01: 00.0/iommu_ group", "../../ ../../kernel/ iommu_groups/ "..., 4096) = 33 <0.000022> GET_STATUS, 0x7fffe3fd6e20) = 0 <0.000018> API_VERSION, 0) = 0 <0.000008> EXTENSION, 0x3) = 0 <0.000011> EXTENSION, 0x1) = 0 <0.000008> EXTENSION, 0x7) = 1 <0.000008> SET_CONTAINER, 0x65690e1bb48) = 0 <0.000008> SPAPR_REGISTER_ MEMORY <unfinished ...> 0x7fabd1f60000, 8257536, MADV_DONTNEED) = 0 <0.000020>
96783 0.000088 readlink(
96783 0.000066 openat(AT_FDCWD, "/dev/vfio/8", O_RDWR|O_CLOEXEC) = 16 <0.000025>
96783 0.000059 ioctl(16, VFIO_GROUP_
96783 0.000050 openat(AT_FDCWD, "/dev/vfio/vfio", O_RDWR|O_CLOEXEC) = 17 <0.000014>
96783 0.000049 ioctl(17, VFIO_GET_
96783 0.000039 ioctl(17, VFIO_CHECK_
96783 0.000040 ioctl(17, VFIO_CHECK_
96783 0.000037 ioctl(17, VFIO_CHECK_
96783 0.000037 ioctl(16, VFIO_GROUP_
96783 0.000037 ioctl(17, VFIO_SET_IOMMU, 0x7) = 0 <0.000039>
96783 0.000070 ioctl(17, VFIO_IOMMU_
96785 10.019032 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <10.022751>
96785 0.053520 madvise(
96785 0.007283 exit(0) = ?
96785 0.000072 +++ exited with 0 +++
96783 276.894553 <... ioctl resumed> , 0x7fffe3fd6b70) = 0 <286.974436>
96783 0.000107 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
This is repeatable and explains why I haven't seen exactly the same that I saw on x86. SPAPR_REGISTER_ MEMORY is ppc specific (but with the same long hang behavior).
VFIO_IOMMU_
Already there at least for ppc the documentation (kernel) says: SPAPR_REGISTER_ MEMORY/ VFIO_IOMMU_ SPAPR_UNREGISTE R_MEMORY ioctls UNREGISTER_ MEMORY is expected to
460 - VFIO_IOMMU_
461 receive a user space address and size of the block to be pinned.
462 Bisecting is not supported and VFIO_IOMMU_
463 be called with the exact address and size used for registering
464 the memory block. The userspace is not expected to call these often.
465 The ranges are stored in a linked list in a VFIO container.
The size seems to be all memory as I see: gpa_to_ vaddr(section, gpa);
reg.vaddr = (uintptr_t) vfio_prereg_
reg.size = end - gpa;
And GDB confirms that is ALL of the guests memory (which explains the scaling with memory size) ->fd, VFIO_IOMMU_ SPAPR_REGISTER_ MEMORY, ®);
78 ret = ioctl(container
(gdb) p reg.size/1024/1024
$3 = 131072
"non bisectable" is the bad flag here.
It might be splittable in the kernel, but for this qemu can't do a lot as it has to be a single range.
Now that we have this confirmed, lets search the same on x86.