I ended up resetting the host and putting it back into OS hoping I messed something up in the process and redoing the process would resolve it. However it seems I took a sidestep where the CentOS7 image won't even work. I see no errors that would indicate why the PCIDevice won't show up in the CentOS7 VM on the first start the VM.
I added all configuration settings to nova.conf and blacklisted the GPUs they show up in the nova database properly.
I started up a CentOS7 machine (see the attached dmesg log).
So at this point I can't add another VM with a GPU (even though 7 devices should be free).
If I delete all VMs with associated GPUs and then start another instance, it will let me start up a GPU instance but still can't see any GPUs attached to the VM (just like the first time). If I try to start a second VM, the same thing happens.
I followed the instructions from this guide to mark the devices as blacklisted by pci-stub.
https:/ /www.pugetsyste ms.com/ labs/articles/ Multiheaded- NVIDIA- Gaming- using-Ubuntu- 14-04-KVM- 585/
I'm using Ubuntu 14.04 which has Kernel 3.x, this (http:// vfio.blogspot. com/2015/ 05/vfio- gpu-how- to-series- part-3- host.html) said vfio-pci only worked with version 4.x+.
I ended up resetting the host and putting it back into OS hoping I messed something up in the process and redoing the process would resolve it. However it seems I took a sidestep where the CentOS7 image won't even work. I see no errors that would indicate why the PCIDevice won't show up in the CentOS7 VM on the first start the VM.
I added all configuration settings to nova.conf and blacklisted the GPUs they show up in the nova database properly.
I started up a CentOS7 machine (see the attached dmesg log).
I checked the libvirtd.log and saw this.
2016-09-30 16:43:41.162+0000: 9360: warning : qemuDomainObjTa int:1900 : Domain id=6 name='instance- 0000012e' uuid=4f12ae0c- 0d50-4d83- 9f8b-3061273b64 da is tainted: high-privileges
the ./qemu/ instance- 0000012e. log looks like this.
2016-09-30 16:43:41.162+0000: starting up local/sbin: /usr/local/ bin:/sbin: /bin:/usr/ sbin:/usr/ bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name instance-0000012e -S -machine pc-i440fx- vivid,accel= kvm,usb= off -cpu Haswell- noTSX,+ abm,+pdpe1gb, +rdrand, +f16c,+ osxsave, +dca,+pdcm, +xtpr,+ tm2,+est, +smx,+vmx, +ds_cpl, +monitor, +dtes64, +pbe,+tm, +ht,+ss, +acpi,+ ds,+vme -m 16384 -realtime mlock=off -smp 6,sockets= 6,cores= 1,threads= 1 -uuid 4f12ae0c- 0d50-4d83- 9f8b-3061273b64 da -smbios type=1, manufacturer= OpenStack Foundation, product= OpenStack Nova,version= 13.0.0, serial= 8e34e073- 7b4c-4e69- 84fa-2d044032ad 30,uuid= 4f12ae0c- 0d50-4d83- 9f8b-3061273b64 da,family= Virtual Machine -no-user-config -nodefaults -chardev socket, id=charmonitor, path=/var/ lib/libvirt/ qemu/instance- 0000012e. monitor, server, nowait -mon chardev= charmonitor, id=monitor, mode=control -rtc base=utc, driftfix= slew -global kvm-pit. lost_tick_ policy= discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb- uhci,id= usb,bus= pci.0,addr= 0x1.0x2 -drive file=/var/ lib/nova/ instances/ 4f12ae0c- 0d50-4d83- 9f8b-3061273b64 da/disk, if=none, id=drive- virtio- disk0,format= qcow2,cache= writethrough -device virtio- blk-pci, scsi=off, bus=pci. 0,addr= 0x4,drive= drive-virtio- disk0,id= virtio- disk0,bootindex =1 -netdev tap,fd= 26,id=hostnet0, vhost=on, vhostfd= 27 -device virtio- net-pci, netdev= hostnet0, id=net0, mac=fa: 16:3e:0c: a0:ea,bus= pci.0,addr= 0x3 -chardev file,id= charserial0, path=/var/ lib/nova/ instances/ 4f12ae0c- 0d50-4d83- 9f8b-3061273b64 da/console. log -device isa-serial, chardev= charserial0, id=serial0 -chardev pty,id=charserial1 -device isa-serial, chardev= charserial1, id=serial1 -device usb-tablet, id=input0 -vnc 0.0.0.0:0 -k en-us -device cirrus- vga,id= video0, bus=pci. 0,addr= 0x2 -device vfio-pci, host=10: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5 -device virtio- balloon- pci,id= balloon0, bus=pci. 0,addr= 0x6 -msg timestamp=on
LC_ALL=C PATH=/usr/
Domain id=6 is tainted: high-privileges
char device redirected to /dev/pts/2 (label charserial1)
vfio-pci, host=10: 00.0 is the GPU on the host so not sure why it won't show up (dmesg of the vm instance is attached)
At this point the GPU was set into this state (sometimes it does, some times it doesn't, im not sure what it means)
10:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev ff) (prog-if ff)
I then tried to startup another instance
in libvirtd.log I see this.
2016-09-30 16:47:47.786+0000: 9357: warning : qemuDomainObjTa int:1900 : Domain id=7 name='instance- 0000012f' uuid=dc37c94f- d6d2-42ac- 8fff-1c3a6604f3 17 is tainted: high-privileges ad:554 : Unable to read from monitor: Connection reset by peer 30T16:47: 51.356627Z qemu-system-x86_64: -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5: vfio: Error: Failed to setup INTx fd: Device or resource busy 30T16:47: 51.358248Z qemu-system-x86_64: -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5: Device initialization failed 30T16:47: 51.358300Z qemu-system-x86_64: -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5: Device 'vfio-pci' could not be initialized
2016-09-30 16:47:52.308+0000: 9355: error : qemuMonitorIORe
2016-09-30 16:47:52.308+0000: 9355: error : qemuMonitorIO:697 : internal error: early end of file from monitor: possible problem:
2016-09-
2016-09-
2016-09-
2016-09-30 16:47:52.308+0000: 9357: error : qemuProcessWait ForMonitor: 2052 : internal error: process exited while connecting to monitor: 2016-09- 30T16:47: 51.356627Z qemu-system-x86_64: -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5: vfio: Error: Failed to setup INTx fd: Device or resource busy 30T16:47: 51.358248Z qemu-system-x86_64: -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5: Device initialization failed 30T16:47: 51.358300Z qemu-system-x86_64: -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5: Device 'vfio-pci' could not be initialized
2016-09-
2016-09-
The associated instance- 0000012f. log
2016-09-30 16:47:47.786+0000: starting up local/sbin: /usr/local/ bin:/sbin: /bin:/usr/ sbin:/usr/ bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name instance-0000012f -S -machine pc-i440fx- vivid,accel= kvm,usb= off -cpu Haswell- noTSX,+ abm,+pdpe1gb, +rdrand, +f16c,+ osxsave, +dca,+pdcm, +xtpr,+ tm2,+est, +smx,+vmx, +ds_cpl, +monitor, +dtes64, +pbe,+tm, +ht,+ss, +acpi,+ ds,+vme -m 16384 -realtime mlock=off -smp 6,sockets= 6,cores= 1,threads= 1 -uuid dc37c94f- d6d2-42ac- 8fff-1c3a6604f3 17 -smbios type=1, manufacturer= OpenStack Foundation, product= OpenStack Nova,version= 13.0.0, serial= 8e34e073- 7b4c-4e69- 84fa-2d044032ad 30,uuid= dc37c94f- d6d2-42ac- 8fff-1c3a6604f3 17,family= Virtual Machine -no-user-config -nodefaults -chardev socket, id=charmonitor, path=/var/ lib/libvirt/ qemu/instance- 0000012f. monitor, server, nowait -mon chardev= charmonitor, id=monitor, mode=control -rtc base=utc, driftfix= slew -global kvm-pit. lost_tick_ policy= discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb- uhci,id= usb,bus= pci.0,addr= 0x1.0x2 -drive file=/var/ lib/nova/ instances/ dc37c94f- d6d2-42ac- 8fff-1c3a6604f3 17/disk, if=none, id=drive- virtio- disk0,format= qcow2,cache= writethrough -device virtio- blk-pci, scsi=off, bus=pci. 0,addr= 0x4,drive= drive-virtio- disk0,id= virtio- disk0,bootindex =1 -netdev tap,fd= 26,id=hostnet0, vhost=on, vhostfd= 28 -device virtio- net-pci, netdev= hostnet0, id=net0, mac=fa: 16:3e:cf: 9c:1d,bus= pci.0,addr= 0x3 -chardev file,id= charserial0, path=/var/ lib/nova/ instances/ dc37c94f- d6d2-42ac- 8fff-1c3a6604f3 17/console. log -device isa-serial, chardev= charserial0, id=serial0 -chardev pty,id=charserial1 -device isa-serial, chardev= charserial1, id=serial1 -device usb-tablet, id=input0 -vnc 0.0.0.0:1 -k en-us -device cirrus- vga,id= video0, bus=pci. 0,addr= 0x2 -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5 -device virtio- balloon- pci,id= balloon0, bus=pci. 0,addr= 0x6 -msg timestamp=on 30T16:47: 51.356627Z qemu-system-x86_64: -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5: vfio: Error: Failed to setup INTx fd: Device or resource busy 30T16:47: 51.358248Z qemu-system-x86_64: -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5: Device initialization failed 30T16:47: 51.358300Z qemu-system-x86_64: -device vfio-pci, host=0f: 00.0,id= hostdev0, bus=pci. 0,addr= 0x5: Device 'vfio-pci' could not be initialized
LC_ALL=C PATH=/usr/
Domain id=7 is tainted: high-privileges
char device redirected to /dev/pts/4 (label charserial1)
2016-09-
2016-09-
2016-09-
2016-09-30 16:47:52.308+0000: shutting down
So at this point I can't add another VM with a GPU (even though 7 devices should be free).
If I delete all VMs with associated GPUs and then start another instance, it will let me start up a GPU instance but still can't see any GPUs attached to the VM (just like the first time). If I try to start a second VM, the same thing happens.