Assigning NVMe disk to a domain causes VFIO_MAP_DMA errors

Bug #1863333 reported by Michal Prívozník on 2020-02-14
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Philippe Mathieu-Daudé

Bug Description

I'm seeing some errors when assigning my NVMe disk to qemu. This is the full command line:

/home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 \
-name guest=fedora,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes \
-machine pc-i440fx-4.1,accel=kvm,usb=off,dump-guest-core=off \
-cpu host \
-m size=4194304k,slots=16,maxmem=1099511627776k \
-overcommit mem-lock=off \
-smp 4,sockets=1,dies=1,cores=2,threads=2 \
-object iothread,id=iothread1 \
-object iothread,id=iothread2 \
-object iothread,id=iothread3 \
-object iothread,id=iothread4 \
-mem-prealloc \
-mem-path /hugepages2M/libvirt/qemu/2-fedora \
-numa node,nodeid=0,cpus=0,mem=4096 \
-uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=31,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-boot menu=on,strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-2-storage","backing":null}' \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-2-format,id=scsi0-0-0-0,bootindex=1 \
-blockdev '{"driver":"nvme","device":"0000:02:00.0","namespace":1,"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0 \
-netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \
-device virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=35,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on \
-device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on

And these are the errors I see:

2020-02-14T09:06:18.183167Z qemu-system-x86_64: VFIO_MAP_DMA failed: Invalid argument
2020-02-14T09:10:49.753767Z qemu-system-x86_64: VFIO_MAP_DMA failed: Cannot allocate memory
2020-02-14T09:11:04.530344Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
2020-02-14T09:11:04.531087Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
2020-02-14T09:11:04.531230Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device

I'm doing nothing with the disk inside the guest, but:

  # dd if=/dev/vda of=/dev/null status=progress

(the disk appears as /dev/vda in the guest). Surprisingly, I do not see these errors when I use the traditional PCI assignment (-device vfio-pci). My versions of kernel and qemu:

moe ~ # uname -r
5.4.15-gentoo
moe ~ # /home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 --version
QEMU emulator version 4.2.50 (v4.2.0-1439-g5d6542bea7-dirty)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

This seems to be due to the vfio-helper code assuming it can map an arbitrarily large IOVA range starting at 64K base address. x86 processors typically have a reserved range near the top of the 32-bit address space which is used for MSI support which is used by the interrupt remapper where we cannot create an overlapping DMA mapping window. Therefore once you have something approaching a 4G VM, you'll see the initial -EINVAL and I assume things fall apart from there. Changing the base IOVA address in vfio-helpers.c seems to be sufficient, ex:

#define QEMU_VFIO_IOVA_MIN 0x100000000ULL

This might be a sufficient legacy fix, but we do now expose valid IOVA ranges through the VFIO API which would allow this driver to dynamically pick IOVA ranges based on what the platform actually supports.

Changed in qemu:
assignee: nobody → Philippe Mathieu-Daudé (philmd)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers