Can't assign system with multiple GPUs to different VMs

Bug #1628168 reported by Kevin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

I have an OS Mitaka deployment that was done by Fuel (9.0).

I have a system with 8GPUs in a single box. We are trying to allow VMs to request access to GPU resources via this box.

I know that with PCI Passthrough you can only have a device assigned to a single VM (e.g. 1 device <-> 1 VM). However, this box has 8 GPUs (8 separate devices). So I want support (1GPU -> 1VM) * 8, or (2GPU -> 1VM) * 4, (4GPU -> 1VM) * 2, or (8GPU -> 1VM) * 1.

I have successfully been able to get the system to have 1 GPU <-> 1 VM, however when I go to create another VM with a GPU I get "not enough hosts found".

This is what I have done so far.

/etc/nova/nova.conf

Add:
 Pic_passthrough_whitelist = [{"vendor_id": "10de", "product_id": "17c2"}]

sudo gedit /etc/modules and add:
 pci_stub
 vfio
 vfio_iommu_type1
 vfio_pci
 kvm
 kvm_intel

Sudo vi /etc/default/grub
 GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1"

//BLACKLIST

sudo gedit /etc/initramfs-tools/modules
 pci_stub ids=10de:17c2
 sudo update-initramfs -u

On Controller Node:

Edit nova.conf

Add specifically for GPU you want to use!

pci_alias={"vendor_id":"10de", "product_id":"17c2", "name":"titanx"}
 Add

scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
 scheduler_available_filters=nova.scheduler.filters.all_filters
 scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
 scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter

#: source openrc
 Nova flavor-key g1.xlarge set "pci_passthrough:alias"="titanx:1"

Actual Results:
When I go to create my second VM with the same flavor it errors out with this message. (If I create 1 VM it works and a GPU is assigned to that machine).

Message: No valid host was found. There are not enough hosts available.
 Code: 500
 File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 121, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 503, in _send raise result

Running SELECT * FROM pci_devices; on the nova database I get the following

http://imgur.com/a/voGki

As you can see it shows 7 are available.

Expected Results:

Another VM created with 1 more GPU used from the system.

Revision history for this message
Kevin (kvasko) wrote :

Was doing some more investigating and found this in the nova-all.log. This looks to like an issue like the device (0f:00.0) is busy, however it shouldn't be as the only one in use *should* be 10:00.0.

All devices seem to be claimed by pci-stub which from my understanding indicates that they can't be claimed by the current running OS.

<179>Sep 27 18:53:48 node-13 nova-conductor: 2016-09-27 18:53:48.631 24595 ERROR nova.scheduler.utils [req-dfd5dfe7-ea36-4ce0-8fe7-2412df59db20 11a8bdff50d34c64b2a9fc2b477af74b 81d1532551c2436793417cd7ef0abf35 - - -] [instance: e5fadc3b-6fab-4524-9a35-c8ac954014bd] Error from last host: cirrascale1 (node cirrascale1): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance e5fadc3b-6fab-4524-9a35-c8ac954014bd was re-scheduled: internal error: process exited while connecting to monitor: 2016-09-27T18:53:46.506916Z qemu-system-x86_64: -device vfio-pci,host=0f:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: Error: Failed to setup INTx fd: Device or resource busy\n2016-09-27T18:53:46.507929Z qemu-system-x86_64: -device vfio-pci,host=0f:00.0,id=hostdev0,bus=pci.0,addr=0x5: Device initialization failed\n2016-09-27T18:53:46.507952Z qemu-system-x86_64: -device vfio-pci,host=0f:00.0,id=hostdev0,bus=pci.0,addr=0x5: Device 'vfio-pci' could not be initialized\n\n"]

Revision history for this message
Kevin (kvasko) wrote :

So a little more information. I was able to get more than 1 VM to start with a GPU attached (e.g. I had 2 VMs, each had 1 GPU attached). I restarted the host VM with the GPUs.

It appears that some of the GPUs are getting into an "in-use" state and won't return.

On the host system that has the GPUs when I reboot the machine and use the command lspci -vnnn | grep VGA, all 8 GPUs show up as the following:

04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev a1) (prog-if 00 [VGA controller])

05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev a1) (prog-if 00 [VGA controller])
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev a1) (prog-if 00 [VGA controller])
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev a1) (prog-if 00 [VGA controller])
0d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev a1) (prog-if 00 [VGA controller])
0e:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev a1) (prog-if 00 [VGA controller])
0f:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev a1) (prog-if 00 [VGA controller])
10:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev a1) (prog-if 00 [VGA controller])

This is with 0 VM instances running that have a GPU associated with them.

At this point after a fresh reboot I started and stopped multiple VMs (started 3x VMs each with 1 GPU attached). Stopped them, and started them back up. No issues. I did that a few more times and then randomly I saw this appear when running lspci -vnnn | grep VGA on one of the cards.

 0d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev ff) (prog-if ff)

I've got 2 machines running with a GPU attached, now at this point any time I try to start another VM with a GPU I get the no hosts found error. So what I *think* is happening is.

After rebooting the host machine none of the GPUS are in that weird (prog-if ff) state. At that point the VMs start up fine with a GPU, until one of the GPUs go into that "(rev ff) (prog-if ff) state. At that point any time OS tries to schedule a new VM to be created it is trying to use the GPU that is "(rev ff) (prog-if)", since it is marked as available in the MySQL database. At that point no other VMs can be created with a VM.

Whatever is causing the GPUs to go into the (rev ff) (prog-if ff) state I'm not sure. All I am doing is creating the VM, seeing if it launches successfully, logging into it, making sure the VM has a GPU associated with the VM and then deleting it from OS.

I'm using the CentOS7 image to test with from here. http://docs.openstack.org/image-guide/obtain-images.html

I'm going to try to debug this issue some more to see if I can narrow down the cause of the cards going into that odd state.

Revision history for this message
Kevin (kvasko) wrote :

So I seem to be able to reproduce this more frequently with the Ubuntu image on the page I linked above.

Looking at dmesg I see.

[ 0.638033] pci 0000:00:05.0: unknown header type 7f, ignoring device

If I go run lspci -vnnn | grep VGA on the host GPU system I see

0d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX TITAN X] [10de:17c2] (rev ff) (prog-if ff)

At this point it won't launch another GPU based machine.

I'm currently trying to change the libvirt version to see if that resolves the issue based off this information http://www.spinics.net/lists/kvm/msg51006.html

Matt Riedemann (mriedem)
tags: added: libvirt pci
Revision history for this message
Ludovic Beliveau (ludovic-beliveau) wrote :

I'm confused, you are saying that "All devices seem to be claimed by pci-stub", but looking at the qemu command line, it is using vfio-pci driver (pci-stub is legacy). So make sure your device is using the right driver when it is passthrough to the guest (for e.g. /sys/bus/pci/devices/0000\:89\:00.0/driver should point to vfio-pci). Based on qemu command line, looks like your fine, but just need to validate ...

When you delete the guest, libvirt should reset the driver back. So in my case (I'm using Intel NICs), the driver gets back to ixgbe (I'm using an Intel NIC ). That might validate if the issue is with libvirt (like you are mentionning above).

Also, look at the libvirt (/var/log/libvirt/libvirtd.log) and qemu logs (under /var/log/libvirt/qemu/), there might be something of interest there.

Revision history for this message
Kevin (kvasko) wrote :
Download full text (7.7 KiB)

I followed the instructions from this guide to mark the devices as blacklisted by pci-stub.

https://www.pugetsystems.com/labs/articles/Multiheaded-NVIDIA-Gaming-using-Ubuntu-14-04-KVM-585/

I'm using Ubuntu 14.04 which has Kernel 3.x, this (http://vfio.blogspot.com/2015/05/vfio-gpu-how-to-series-part-3-host.html) said vfio-pci only worked with version 4.x+.

I ended up resetting the host and putting it back into OS hoping I messed something up in the process and redoing the process would resolve it. However it seems I took a sidestep where the CentOS7 image won't even work. I see no errors that would indicate why the PCIDevice won't show up in the CentOS7 VM on the first start the VM.

I added all configuration settings to nova.conf and blacklisted the GPUs they show up in the nova database properly.

I started up a CentOS7 machine (see the attached dmesg log).

I checked the libvirtd.log and saw this.

2016-09-30 16:43:41.162+0000: 9360: warning : qemuDomainObjTaint:1900 : Domain id=6 name='instance-0000012e' uuid=4f12ae0c-0d50-4d83-9f8b-3061273b64da is tainted: high-privileges

the ./qemu/instance-0000012e.log looks like this.

2016-09-30 16:43:41.162+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name instance-0000012e -S -machine pc-i440fx-vivid,accel=kvm,usb=off -cpu Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 16384 -realtime mlock=off -smp 6,sockets=6,cores=1,threads=1 -uuid 4f12ae0c-0d50-4d83-9f8b-3061273b64da -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=13.0.0,serial=8e34e073-7b4c-4e69-84fa-2d044032ad30,uuid=4f12ae0c-0d50-4d83-9f8b-3061273b64da,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0000012e.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/4f12ae0c-0d50-4d83-9f8b-3061273b64da/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=writethrough -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:0c:a0:ea,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/4f12ae0c-0d50-4d83-9f8b-3061273b64da/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device vfio-pci,host=10:00.0,id=hostdev0,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
Domain id=6 is tainted: high-privileges
char device redirected to /dev/pts/2 (label charserial1)

vfio-pci,host=10:00.0 is the GPU on the host so not sure why it won't show up (dmesg of th...

Read more...

Sean Dague (sdague)
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version mitaka in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.mitaka
Revision history for this message
Konstantinos Samaras-Tsakiris (kosamara) wrote :

Is this still relevant?

Revision history for this message
Kevin (kvasko) wrote : Re: [Bug 1628168] Re: Can't assign system with multiple GPUs to different VMs
Download full text (4.4 KiB)

At this point it is not relevant any more and was seemingly a hardware problem.

-Kevin

> On Mar 21, 2018, at 11:14 AM, Konstantinos Samaras-Tsakiris <email address hidden> wrote:
>
> Is this still relevant?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1628168
>
> Title:
> Can't assign system with multiple GPUs to different VMs
>
> Status in OpenStack Compute (nova):
> Confirmed
>
> Bug description:
> I have an OS Mitaka deployment that was done by Fuel (9.0).
>
> I have a system with 8GPUs in a single box. We are trying to allow VMs
> to request access to GPU resources via this box.
>
> I know that with PCI Passthrough you can only have a device assigned
> to a single VM (e.g. 1 device <-> 1 VM). However, this box has 8 GPUs
> (8 separate devices). So I want support (1GPU -> 1VM) * 8, or (2GPU ->
> 1VM) * 4, (4GPU -> 1VM) * 2, or (8GPU -> 1VM) * 1.
>
> I have successfully been able to get the system to have 1 GPU <-> 1
> VM, however when I go to create another VM with a GPU I get "not
> enough hosts found".
>
> This is what I have done so far.
>
> /etc/nova/nova.conf
>
> Add:
> Pic_passthrough_whitelist = [{"vendor_id": "10de", "product_id": "17c2"}]
>
> sudo gedit /etc/modules and add:
> pci_stub
> vfio
> vfio_iommu_type1
> vfio_pci
> kvm
> kvm_intel
>
> Sudo vi /etc/default/grub
> GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1"
>
> //BLACKLIST
>
> sudo gedit /etc/initramfs-tools/modules
> pci_stub ids=10de:17c2
> sudo update-initramfs -u
>
> On Controller Node:
>
> Edit nova.conf
>
> Add specifically for GPU you want to use!
>
> pci_alias={"vendor_id":"10de", "product_id":"17c2", "name":"titanx"}
> Add
>
> scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
> scheduler_available_filters=nova.scheduler.filters.all_filters
> scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
> scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter
>
> #: source openrc
> Nova flavor-key g1.xlarge set "pci_passthrough:alias"="titanx:1"
>
> Actual Results:
> When I go to create my second VM with the same flavor it errors out with this message. (If I create 1 VM it works and a GPU is assigned to that machine).
>
> Message: No valid host was found. There are not enough hosts available.
> Code: 500
> File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/pyth...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.