vfio-pci passed Radeon 7870XT is unstable on first boot of a Windows 8.1 guest

Bug #1265998 reported by Michał Węgrzynek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

I'm passing a Radeon 7870XT to a Windows 8.1 guest. It works flawlessly (I tested it by a 12 hour Furmark run), but only on second lauch of the guest. On first launch after I get screen corruption on any 3D operation in guest (even showing a search box in Chrome) and the guest becomes unusable with constant driver resets to a point, when it grinds to a halt. At that moment I get about ~1K of log entries similar to

sty 04 11:12:57 miner kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0002 address=0x000000007bc4c100 flags=0x0010]
sty 04 11:12:57 miner kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0002

When I abort first launch of the quest with a monitor quit command during during guest BIOS initialisation, on a second launch I get the following errors in Qemu monitor:

qemu-system-x86_64: vfio_dma_map(0x7f4b93fbbd40, 0x0, 0xb0000000, 0x2aaac0000000) = -16 (Device or resource busy)
qemu-system-x86_64: vfio_dma_map(0x7f4b93fbbd40, 0xc0000, 0xaff40000, 0x2aaac00c0000) = -16 (Device or resource busy)
qemu-system-x86_64: vfio_dma_map(0x7f4b93fbbd40, 0xc8000, 0xaff38000, 0x2aaac00c8000) = -16 (Device or resource busy)
qemu-system-x86_64: vfio_dma_map(0x7f4b93fbbd40, 0xd0000, 0xaff30000, 0x2aaac00d0000) = -16 (Device or resource busy)

and the following entries in the kernel log

sty 01 22:04:53 miner kernel: vfio_ecap_init: 0000:01:00.0 hiding ecap 0x1b@0x2d0
sty 01 22:04:55 miner kernel: ------------[ cut here ]------------
sty 01 22:04:55 miner kernel: WARNING: CPU: 3 PID: 1012 at drivers/vfio/vfio_iommu_type1.c:685 vfio_dma_do_map+0x43c/0x838 [vfio_iommu_type1]()
sty 01 22:04:55 miner kernel: Modules linked in: tun bridge stp llc bnep vfio_pci vfio_iommu_type1 vfio fuse ts2020 ds3000 dvb_usb_dw2102 kvm_amd kvm crc32_pclmul c...luetooth rc_
sty 01 22:04:55 miner kernel: CPU: 3 PID: 1012 Comm: qemu-system-x86 Not tainted 3.13.0-2-mainline #1
sty 01 22:04:55 miner kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A85X Extreme6, BIOS P2.30 07/11/2013
sty 01 22:04:55 miner kernel: 0000000000000009 ffff8804311b9d20 ffffffff814ddc72 0000000000000000
sty 01 22:04:55 miner kernel: ffff8804311b9d58 ffffffff8105f2ed 00000000fffffff0 00000000b0000000
sty 01 22:04:55 miner kernel: 00000000000b0000 00000000b0000000 0000000000000000 ffff8804311b9d68
sty 01 22:04:55 miner kernel: Call Trace:
sty 01 22:04:55 miner kernel: [<ffffffff814ddc72>] dump_stack+0x4d/0x6f
sty 01 22:04:55 miner kernel: [<ffffffff8105f2ed>] warn_slowpath_common+0x7d/0xa0
sty 01 22:04:55 miner kernel: [<ffffffff8105f3ca>] warn_slowpath_null+0x1a/0x20
sty 01 22:04:55 miner kernel: [<ffffffffa0ce9dec>] vfio_dma_do_map+0x43c/0x838 [vfio_iommu_type1]
sty 01 22:04:55 miner kernel: [<ffffffffa0cea3f9>] vfio_iommu_type1_ioctl+0x211/0x288 [vfio_iommu_type1]
sty 01 22:04:55 miner kernel: [<ffffffffa0ce0e28>] vfio_fops_unl_ioctl+0x78/0x340 [vfio]
sty 01 22:04:55 miner kernel: [<ffffffff811adfb8>] do_vfs_ioctl+0x2d8/0x4b0
sty 01 22:04:55 miner kernel: [<ffffffff811ae211>] SyS_ioctl+0x81/0xa0
sty 01 22:04:55 miner kernel: [<ffffffff814ebd2d>] system_call_fastpath+0x1a/0x1f
sty 01 22:04:55 miner kernel: ---[ end trace 53ac540a2e8783dc ]---
sty 01 22:04:56 miner kernel: ------------[ cut here ]------------

but the guest works flawlessly (even with reboots).

qemu --version
QEMU emulator version 1.7.50, Copyright (c) 2003-2008 Fabrice Bellard
(a build of Arch qemu-git package from 2013-12-24, but the errors is the same on stock 1.7.0 Qemu)

uname -a:
Linux miner 3.13.0-2-mainline #1 SMP PREEMPT Wed Jan 1 14:37:23 CET 2014 x86_64 GNU/Linux
(Arch linux-mainline package with 3.13-rc5, but errors are exactly the same on stock 3.12.6-1)

kernel command line:
Command line: BOOT_IMAGE=/vmlinuz-linux-mainline root=UUID=4598b2bf-a2a9-46f8-bddd-e010f0af617a rw quiet elevator=deadline rootflags=data=writeback nohz=off iommu=pt pci-stub.ids=1002:679e,1002:aaa0 radeon.dpm=1 radeon.fastfb=1 transparent_hugepage=never hugepagesz=1G hugepages=6 default_hugepagesz=1G

qemu command line:
/usr/bin/qemu-system-x86_64 -bios /usr/share/qemu/bios.bin \
  -cpu Opteron_G4 -smp 2,sockets=1,cores=2,threads=1 \
  -m 4096 -mem-prealloc -mem-path /dev/hugepages \
  -nodefaults -M q35 -enable-kvm \
  -monitor stdio -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
  -device vfio-pci,host=01:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1,romfile=/usr/share/qemu/Tahiti.rom \
  -device vfio-pci,host=01:00.1,bus=pcie.0 -device ahci,bus=pcie.0,id=ahci \
  -usb -usbdevice host:072f:9000 \
  -netdev bridge,br=br0,id=hostnet0 \
  -device virtio-net-pci,netdev=hostnet0,id=net0 \
  -drive file=/home/mwegrzynek/VM/vmwin801.qcow2,if=virtio,cache=writeback,aio=native \
  -drive file=/home/mwegrzynek/Pobrane/Windows/Windows8.iso,id=isocd -device ide-cd,bus=ahci.1,drive=isocd
  -drive file=/home/mwegrzynek/Pobrane/Windows/virtio-win-0.1-74.iso,id=isocd2 -device ide-cd,bus=ahci.2,drive=isocd2 \
  -boot c \
  -serial none \
  -parallel none \
  -vga none \
  -nographic

lspci:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) I/O Memory Management Unit
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Trinity [Radeon HD 7660D]
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Root Port
00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 03)
00:10.1 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 03)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 11)
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 11)
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 14)
00:14.1 IDE interface: Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller (rev 01)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] FCH PCI Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
00:15.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Hudson PCI to PCI bridge (PCIE port 0)
00:15.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Hudson PCI to PCI bridge (PCIE port 2)
00:15.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Hudson PCI to PCI bridge (PCIE port 3)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Function 5
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti LE [Radeon HD 7870 XT]
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT HDMI Audio [Radeon HD 7970 Series]
03:00.0 USB controller: MosChip Semiconductor Technology Ltd. MCS9990 PCIe to 4‐Port USB 2.0 Host Controller
03:00.1 USB controller: MosChip Semiconductor Technology Ltd. MCS9990 PCIe to 4‐Port USB 2.0 Host Controller
03:00.2 USB controller: MosChip Semiconductor Technology Ltd. MCS9990 PCIe to 4‐Port USB 2.0 Host Controller
03:00.3 USB controller: MosChip Semiconductor Technology Ltd. MCS9990 PCIe to 4‐Port USB 2.0 Host Controller
03:00.4 USB controller: MosChip Semiconductor Technology Ltd. MCS9990 PCIe to 4‐Port USB 2.0 Host Controller
03:00.5 USB controller: MosChip Semiconductor Technology Ltd. MCS9990 PCIe to 4‐Port USB 2.0 Host Controller
03:00.6 USB controller: MosChip Semiconductor Technology Ltd. MCS9990 PCIe to 4‐Port USB 2.0 Host Controller
03:00.7 USB controller: MosChip Semiconductor Technology Ltd. MCS9990 PCIe to 4‐Port USB 2.0 Host Controller
04:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

Please let me now, if any other information is needed.

Revision history for this message
Alex Williamson (alex-l-williamson) wrote :

Does this only happen with hugepages? Does it happen with 2M hugepages?

Revision history for this message
Michał Węgrzynek (peaquino) wrote :

It seems everything works ok without hugepages (without -mem-prealloc and -mem-path qemu parameters and with transparent_hugepage, hugepagesz, hugepages, default_hugepagesz removed form kernel command line).

Setting transparent_hugepage=never hugepagesz=2M hugepages=3000 default_hugepagesz=2M at the kernel command line results in even higher unstability (starting Windows results with a host reboot).

Revision history for this message
Michał Węgrzynek (peaquino) wrote :

Sorry, I propably just was lucky. It doesn't work without hugepages also. The only thing I can do to make the 7870XT operate correclty under guest is to make it show

qemu-system-x86_64: vfio_dma_map(0x7f01db7deec0, 0xc0000, 0xaff40000, 0x2aaac00c0000) = -16 (Device or resource busy)

errors on start.

Revision history for this message
Michał Węgrzynek (peaquino) wrote :

Success!

I was finally able to make the qemu runnig with any additional magic. All I need was to load the

vfio_iommu_type1

module with

disable_hugepages=1

option.

Thanks for pointing me in the right direction!

Changed in qemu:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.