Hello Again, finally I could get back to this, and.. I was finishing a patch creating the open+truncate+mmap+unlink mechanism on files specified by "vhostlog" parameter of tap devices. Patch is done, problem is that... looks like the "memfd" is only used for shared logs AND vhost-net (used for tap devices) doesn't use it. In the following... (scenario 1) Linux kvm01 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:15:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux with: -netdev tap,id=net0,vhost=on -device virtio-net-pci,netdev=net0,id=net0,mac=52:54:00:20:c5:42,bus=pci.0,addr=0x3 ## kvm01 $ ./instance.sh qemu_memfd_check qemu_memfd_alloc: enter qemu_memfd_alloc: memfd_create with no sealing qemu_memfd_alloc: memfd_create worked, truncating... qemu_memfd_alloc: mmaping qemu_memfd_free: enter qemu_memfd_check: ok vhost_dev_start: enter vhost_log_get: enter vhost_log_alloc: enter vhost_log_alloc: local vhost_log_get: not shared vhost_log_put: enter vhost_log_put: enter vhost_log_put: local free (qemu) migrate -d tcp:kvm02:4444 (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compres Migration status: completed total time: 14586 milliseconds downtime: 10 milliseconds setup: 20 milliseconds transferred ram: 377224 kbytes throughput: 212.02 mbps remaining ram: 0 kbytes total ram: 4001544 kbytes duplicate: 908879 pages skipped: 0 pages normal: 92129 pages normal bytes: 368516 kbytes dirty sync count: 4 ## kvm02 $ ./instance.sh qemu_memfd_check qemu_memfd_alloc: enter qemu_memfd_alloc: memfd_create with no sealing qemu_memfd_alloc: memfd_create worked, truncating... qemu_memfd_alloc: mmaping qemu_memfd_free: enter qemu_memfd_check: ok vhost_dev_start: enter (scenario 2) Linux kvm01 3.13.0-99-generic #146-Ubuntu SMP Wed Oct 12 20:56:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux with: -netdev tap,id=net0,vhost=on -device virtio-net-pci,netdev=net0,id=net0,mac=52:54:00:20:c5:42,bus=pci.0,addr=0x3 ## kvm01 $ ./instance.sh qemu_memfd_check qemu_memfd_alloc: enter qemu_memfd_alloc: memfd_create with no sealing qemu_memfd_alloc: memfd_create failed #2 qemu_memfd_alloc: fallback qemu_memfd_alloc: fname = /tmp/memfd-XXXXXX qemu_memfd_alloc: fallback truncating qemu_memfd_alloc: mmaping qemu_memfd_free qemu_memfd_check: ok vhost_dev_start: enter vhost_log_get: enter vhost_log_alloc: enter vhost_log_alloc: local vhost_log_get: not shared vhost_log_put: enter vhost_log_put: enter vhost_log_put: local free (qemu) migrate -d tcp:kvm02:4444 (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compres Migration status: completed total time: 15400 milliseconds downtime: 9 milliseconds setup: 5 milliseconds transferred ram: 375812 kbytes throughput: 199.99 mbps remaining ram: 0 kbytes total ram: 4001544 kbytes duplicate: 909186 pages skipped: 0 pages normal: 91776 pages normal bytes: 367104 kbytes dirty sync count: 3 ## kvm02 $ ./instance.sh qemu_memfd_check qemu_memfd_alloc: enter qemu_memfd_alloc: memfd_create with no sealing qemu_memfd_alloc: memfd_create failed #2 qemu_memfd_alloc: fallback qemu_memfd_alloc: fname = /tmp/memfd-XXXXXX qemu_memfd_alloc: fallback truncating qemu_memfd_alloc: mmaping qemu_memfd_free qemu_memfd_check: ok vhost_dev_start: enter For kvm01, we have 2 parts: (1) From "-netdev tap,id=net0,vhost=on": - net_init_clients() - net_init_client() - net_client_init() - net_client_init1() - net_client_init_fun() .. net_init_tap() in my case - net_init_tap_one() - vhost_net_init() - vhost_dev_init() - migration checks (host feature, memfd functional test) (2) From "-device virtio-net-pci,netdev=net0...": - virtio_pci_device_plugged() - virtio_pci_modern_regions_init() - virtio_pci_common_write() - virtio_set_status() - virtio_net_set_status() - virtio_net_vhost_status() - vhost_net_start() - vhost_net_start_one() - vhost_dev_start() - does the log allocation logic It looks like "vhost_requires_shm_log" isn't defined by my underlaying VHOST driver (vhost-net in my case). It seems that vhost-user defines it (from VhostOps user_ops). Judging by the outputs above, looks like vhost_dev_log_is_shared is returning false, making (2) - vhost_dev_start - to use a different log allocation (malloc) than the one that was tested for allowing migrations at (1) - vhost_dev_init. Question: Why to check for "memfd" when its not sure - yet - if a shared descriptor and memory pointer is going to be needed for the migration to happen ? Do you want me to change that ? If memfd fails, but, the guest in question is using regular "malloc" for vhost log, we are marking it unable to live migrate by mistake. I could check for vhost_requires_shm_log pointer during vhost_dev_init (coming from tap). Also, if possible, I would like comments about a draft: https://pastebin.canonical.com/168579/ (please disregard printfs and minor problems) OBS: I'm basically removing fallback mechanism from memfd, creating a generic qemu_mmap_XXX implementation, adding a vhostlog parameter in tap cmdline AND changing the decision on what to use: if vhostlog is present in cmdline, qemu_mmap_XXX on vhostlog is used. If it is a directory, a random file is created inside it. If it is a file, the file is used. If no vhostlog is given (default while libvirt isn't changed), it tries first to use memfd (all newer kernels), and, if not possible, it tries to fallback using the qemu_mmap mechanism on "tmp" directory creating random files. PS: Remember that this is because selinux/apparmor labelling on tmp files (and because file descriptors can be passed away, like we discussed before). If that is okay I'll provide a patch asap. Let me know if you prefer something else. Thank you, Rafael > On Oct 04, 2016, at 12:29, Rafael David Tinoco