Hi everyone, I'm using Openstack Queen with Kolla Ansible.
https://docs.openstack.org/kolla-ansible/latest/
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c08ab2bc6e8e registry.vscaler.com:5000/kolla/centos-binary-neutron-openvswitch-agent:6.0.1 "kolla_start" 11 weeks ago Up 5 weeks neutron_openvswitch_agent
65c7efca89c1 registry.vscaler.com:5000/kolla/centos-binary-openvswitch-vswitchd:6.0.1 "kolla_start" 11 weeks ago Up 5 weeks openvswitch_vswitchd
982fdc925784 registry.vscaler.com:5000/kolla/centos-binary-openvswitch-db-server:6.0.1 "kolla_start" 11 weeks ago Up 5 weeks openvswitch_db
05c2908fcff9 registry.vscaler.com:5000/kolla/centos-binary-nova-compute:6.0.1 "kolla_start" 11 weeks ago Up 3 weeks nova_compute
b22fcbc6b48f registry.vscaler.com:5000/kolla/centos-binary-nova-libvirt:6.0.1 "kolla_start" 11 weeks ago Up 5 weeks nova_libvirt
754fda56368e registry.vscaler.com:5000/kolla/centos-binary-nova-ssh:6.0.1 "kolla_start" 11 weeks ago Up 5 weeks nova_ssh
72ca828b94e8 registry.vscaler.com:5000/kolla/centos-binary-cron:6.0.1 "kolla_start" 11 weeks ago Up 5 weeks cron
d86fd4806efe registry.vscaler.com:5000/kolla/centos-binary-kolla-toolbox:6.0.1 "kolla_start" 11 weeks ago Up 5 weeks kolla_toolbox
d0d7bf199bf1 registry.vscaler.com:5000/kolla/centos-binary-fluentd:6.0.1 "kolla_start" 11 weeks ago Up 5 weeks fluentd
I have 4 NVIDIA Corporation GP100GL Card configured with Nvidia drivers NVIDIA-GRID-RHEL-7.5-390.72-390.75-391.81.
The nova.conf is configure with:
[devices]
enabled_vgpu_types = nvidia-96
So, I have 16 VGPU to use individually with each instance. I have no problem using the Horizon interface when I want to create the instances one at a time, the problem appears when I select 5 or 6 instances to create at the same time.
The instance properties are: 8GB RAM, 4 VCPUS and 1 VGPU, without VOLUME.
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2031, in _build_and_run_instance
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] block_device_info=block_device_info)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3089, in spawn2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] destroy_disks_on_failure=True)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5614, in _create_domain_and_network
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] destroy_disks_on_failure)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] self.force_reraise()
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] six.reraise(self.type_, self.value, self.tb)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5583, in _create_domain_and_network
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] post_xml_callback=post_xml_callback)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5502, in _create_domain
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] guest.launch(pause=pause)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] self._encoded_xml, errors='ignore')
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] self.force_reraise()
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] six.reraise(self.type_, self.value, self.tb)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] return self._domain.createWithFlags(flags)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] result = proxy_call(self._autowrap, f, *args, **kwargs)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] rv = execute(f, *args, **kwargs)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] six.reraise(c, e, tb)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] rv = meth(*args, **kwargs)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8] libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/8d219724-ba8c-11e8-8c75-0cc47ad9af5c is in use by driver QEMU, domain instance-00001396
2018-10-09 16:53:41.815 7 ERROR nova.compute.manager [instance: bf4b679f-50ec-4ade-a530-a822486999a8]
2018-10-09 16:53:41.818 7 INFO nova.compute.manager [req-15e5d59f-4a08-46e2-b748-94a4fe5daf2f 7e942918147f479e8c07877240de5d16 c5957eec2ab84cd5b45b9324ed9a3b29 - default default] [instance: bf4b679f-50ec-4ade-a530-a822486999a8] Terminating instance
I think is a race condition problem, because the VGPU is assigned to one instance, after that, Nova wants to assign the same VGPU to another instance but it was previously assigned.
Thanks!
A duplicate of https:/ /bugs.launchpad .net/nova/ +bug/1780225