Comment 8 for bug 1628168

Revision history for this message
Kevin (kvasko) wrote : Re: [Bug 1628168] Re: Can't assign system with multiple GPUs to different VMs

At this point it is not relevant any more and was seemingly a hardware problem.

-Kevin

> On Mar 21, 2018, at 11:14 AM, Konstantinos Samaras-Tsakiris <email address hidden> wrote:
>
> Is this still relevant?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1628168
>
> Title:
> Can't assign system with multiple GPUs to different VMs
>
> Status in OpenStack Compute (nova):
> Confirmed
>
> Bug description:
> I have an OS Mitaka deployment that was done by Fuel (9.0).
>
> I have a system with 8GPUs in a single box. We are trying to allow VMs
> to request access to GPU resources via this box.
>
> I know that with PCI Passthrough you can only have a device assigned
> to a single VM (e.g. 1 device <-> 1 VM). However, this box has 8 GPUs
> (8 separate devices). So I want support (1GPU -> 1VM) * 8, or (2GPU ->
> 1VM) * 4, (4GPU -> 1VM) * 2, or (8GPU -> 1VM) * 1.
>
> I have successfully been able to get the system to have 1 GPU <-> 1
> VM, however when I go to create another VM with a GPU I get "not
> enough hosts found".
>
> This is what I have done so far.
>
> /etc/nova/nova.conf
>
> Add:
> Pic_passthrough_whitelist = [{"vendor_id": "10de", "product_id": "17c2"}]
>
> sudo gedit /etc/modules and add:
> pci_stub
> vfio
> vfio_iommu_type1
> vfio_pci
> kvm
> kvm_intel
>
> Sudo vi /etc/default/grub
> GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1"
>
> //BLACKLIST
>
> sudo gedit /etc/initramfs-tools/modules
> pci_stub ids=10de:17c2
> sudo update-initramfs -u
>
> On Controller Node:
>
> Edit nova.conf
>
> Add specifically for GPU you want to use!
>
> pci_alias={"vendor_id":"10de", "product_id":"17c2", "name":"titanx"}
> Add
>
> scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
> scheduler_available_filters=nova.scheduler.filters.all_filters
> scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
> scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter
>
> #: source openrc
> Nova flavor-key g1.xlarge set "pci_passthrough:alias"="titanx:1"
>
> Actual Results:
> When I go to create my second VM with the same flavor it errors out with this message. (If I create 1 VM it works and a GPU is assigned to that machine).
>
> Message: No valid host was found. There are not enough hosts available.
> Code: 500
> File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 121, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 503, in _send raise result
>
> Running SELECT * FROM pci_devices; on the nova database I get the
> following
>
> http://imgur.com/a/voGki
>
> As you can see it shows 7 are available.
>
> Expected Results:
>
> Another VM created with 1 more GPU used from the system.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/1628168/+subscriptions