Can't assign system with multiple GPUs to different VMs

Bug #1627196 reported by Kevin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nova-solver-scheduler
Invalid
Undecided
Unassigned

Bug Description

I have an OS Mitaka deployment that was done by Fuel (9.0).

I have a system with 8GPUs in a single box. We are trying to allow VMs to request access to GPU resources via this box.

I know that with PCI Passthrough you can only have a device assigned to a single VM (e.g. 1 device <-> 1 VM). However, this box has 8 GPUs (8 separate devices). So I want support (1GPU -> 1VM) * 8, or (2GPU -> 1VM) * 4, (4GPU -> 1VM) * 2, or (8GPU -> 1VM) * 1.

I have successfully been able to get the system to have 1 GPU <-> 1 VM, however when I go to create another VM with a GPU I get "not enough hosts found".

This is what I have done so far.

/etc/nova/nova.conf

Add:
Pic_passthrough_whitelist = [{"vendor_id": "10de", "product_id": "17c2"}]

sudo gedit /etc/modules and add:
pci_stub
vfio
vfio_iommu_type1
vfio_pci
kvm
kvm_intel

Sudo vi /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1"

//BLACKLIST

sudo gedit /etc/initramfs-tools/modules
pci_stub ids=10de:17c2
sudo update-initramfs -u

On Controller Node:

Edit nova.conf

Add specifically for GPU you want to use!

pci_alias={"vendor_id":"10de", "product_id":"17c2", "name":"titanx"}
Add

scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter

#: source openrc
Nova flavor-key g1.xlarge set "pci_passthrough:alias"="titanx:1"

If I create 1 VM it works. When I go to create my second VM with the same flavor it errors out with this message.

Message: No valid host was found. There are not enough hosts available.
Code: 500
File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 121, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 503, in _send raise result

Running SELECT * FROM pci_devices; on the nova database I get the following

http://imgur.com/a/voGki

As you can see it shows 7 are available.

Kevin (kvasko)
Changed in nova-solver-scheduler:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.