Libvirt error when using --max > 1 with vGPU

Bug #1780225 reported by Erwan Gallen
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Sylvain Bauza

Bug Description

Description
===========

Using devstack Rocky with a NVIDIA Tesla M10 + GRID driver on RHEL 7.5.
Profile used in nova: nvidia-35 (num_heads=2, frl_config=45, framebuffer=512M, max_resolution=2560x1600, max_instance=16)

I can launch instances one by one without any issue.
I cannot use --max paramater greater than 1.

Expected result
===============

Be able to use --max parameter with vGPU

Steps to reproduce
==================

[root@host2 ~]# openstack server list
+--------------------------------------+-----------+--------+---------------------------------------------------------------------+--------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-----------+--------+---------------------------------------------------------------------+--------+--------+
| 56aeda96-f193-49fc-914d-8b507674eb16 | instance0 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu |
+--------------------------------------+-----------+--------+---------------------------------------------------------------------+--------+--------+

[root@host2 ~]# openstack server create --flavor vgpu --image rhel75 --key-name myself --max 2 instance
+-------------------------------------+-----------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | None |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | None |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| adminPass | iNiFmD6kNszw |
| config_drive | |
| created | 2018-07-05T09:19:25Z |
| flavor | vgpu (vgpu1) |
| hostId | |
| id | 5a8691a8-a18c-4c71-8541-be00f224fd82 |
| image | rhel75 (e63a49a8-4568-4b57-9d12-1eb1ede28438) |
| key_name | myself |
| name | instance-1 |
| progress | 0 |
| project_id | fdea2c781db74ae593c5e9501e9290cc |
| properties | |
| security_groups | name='default' |
| status | BUILD |
| updated | 2018-07-05T09:19:25Z |
| user_id | 130a646fc362418f8b62ac11f1154942 |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------+

[root@host2 ~]# openstack server list
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR | | rhel75 | vgpu |
| 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11 | rhel75 | vgpu |
| 56aeda96-f193-49fc-914d-8b507674eb16 | instance0 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

[root@host2 ~]# openstack server create --flavor vgpu --image rhel75 --key-name myself --max 1 instance
+-------------------------------------+-----------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | None |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | None |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| adminPass | MGxmntECb22S |
| config_drive | |
| created | 2018-07-05T09:19:45Z |
| flavor | vgpu (vgpu1) |
| hostId | |
| id | 24df940f-500b-44db-88e2-a6fd1fe915c0 |
| image | rhel75 (e63a49a8-4568-4b57-9d12-1eb1ede28438) |
| key_name | myself |
| name | instance |
| progress | 0 |
| project_id | fdea2c781db74ae593c5e9501e9290cc |
| properties | |
| security_groups | name='default' |
| status | BUILD |
| updated | 2018-07-05T09:19:45Z |
| user_id | 130a646fc362418f8b62ac11f1154942 |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------+

[root@host2 ~]# openstack server list
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| 24df940f-500b-44db-88e2-a6fd1fe915c0 | instance | BUILD | private=fda2:f16f:605e:0:f816:3eff:fefd:8796, 10.0.0.7 | rhel75 | vgpu |
| 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR | | rhel75 | vgpu |
| 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11 | rhel75 | vgpu |
| 56aeda96-f193-49fc-914d-8b507674eb16 | instance0 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

[root@host2 ~]# openstack server list
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| 24df940f-500b-44db-88e2-a6fd1fe915c0 | instance | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fefd:8796, 10.0.0.7 | rhel75 | vgpu |
| 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR | | rhel75 | vgpu |
| 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11 | rhel75 | vgpu |
| 56aeda96-f193-49fc-914d-8b507674eb16 | instance0 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

[root@host2 ~]# openstack server create --flavor vgpu --image rhel75 --key-name myself --max 1 instance
+-------------------------------------+-----------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | None |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | None |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| adminPass | 69crZEFxBT9j |
| config_drive | |
| created | 2018-07-05T09:21:43Z |
| flavor | vgpu (vgpu1) |
| hostId | |
| id | 4a172549-91c2-46cc-8895-cd2fcbb19430 |
| image | rhel75 (e63a49a8-4568-4b57-9d12-1eb1ede28438) |
| key_name | myself |
| name | instance |
| progress | 0 |
| project_id | fdea2c781db74ae593c5e9501e9290cc |
| properties | |
| security_groups | name='default' |
| status | BUILD |
| updated | 2018-07-05T09:21:43Z |
| user_id | 130a646fc362418f8b62ac11f1154942 |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------+

[root@host2 ~]# openstack server list
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| 4a172549-91c2-46cc-8895-cd2fcbb19430 | instance | BUILD | | rhel75 | vgpu |
| 24df940f-500b-44db-88e2-a6fd1fe915c0 | instance | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fefd:8796, 10.0.0.7 | rhel75 | vgpu |
| 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR | | rhel75 | vgpu |
| 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11 | rhel75 | vgpu |
| 56aeda96-f193-49fc-914d-8b507674eb16 | instance0 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

[root@host2 ~]# openstack server list
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
| 4a172549-91c2-46cc-8895-cd2fcbb19430 | instance | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe7d:a6d8, 10.0.0.4 | rhel75 | vgpu |
| 24df940f-500b-44db-88e2-a6fd1fe915c0 | instance | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fefd:8796, 10.0.0.7 | rhel75 | vgpu |
| 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR | | rhel75 | vgpu |
| 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11 | rhel75 | vgpu |
| 56aeda96-f193-49fc-914d-8b507674eb16 | instance0 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu |
+--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

- Nova error:
{u'message': u'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance de2a5078-6acd-4ffd-9895-d664adb42296.', u'code': 500, u'details': u' File "/opt/stack/nova/nova/conductor/manager.py", line 579, in build_instances\n raise exception.MaxRetriesExceeded(reason=msg)\n', u'created': u'2018-07-05T07:32:52Z'} |

- Libvirt error:
messages:Jul 5 03:32:51 host2 nova-compute: #033[00m: libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/25f56195-9719-4380-a90b-084d64307e06 is in use by driver QEMU, domain instance-00000019
messages:Jul 5 03:32:51 host2 nova-compute: #033[01;31mERROR nova.virt.libvirt.driver [#033[01;36mNone req-e04582ed-de22-4bfa-9253-92e687328a4c #033[00;36mservice nova#033[01;31m] #033[01;35m[instance: de2a5078-6acd-4ffd-9895-d664adb42296] #033[01;31mFailed to start libvirt guest#033[00m: libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/25f56195-9719-4380-a90b-084d64307e06 is in use by driver QEMU, domain instance-00000019

Tags: libvirt vgpu
Erwan Gallen (egallen)
description: updated
Changed in nova:
assignee: nobody → Sylvain Bauza (sylvain-bauza)
importance: Undecided → High
Revision history for this message
melanie witt (melwitt) wrote :

Setting this to Confirmed as the Importance has been set and the bug has been Assigned.

Changed in nova:
status: New → Confirmed
tags: added: libvirt
Revision history for this message
Mariusz Karpiarz (mkarpiarz) wrote :

Hi,
Any update on this one?

As described here:
https://bugs.launchpad.net/nova/+bug/1797269
the problem seems to be caused by nova assigning the same mdev device UUID to multiple templates.

Here is an instance that launches correctly:

```
  <name>instance-00000241</name>
  <uuid>8b926f14-debc-4a15-a254-b068cf0366d6</uuid>
...
    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
      <source>
        <address uuid='7f41921f-0683-4934-91e6-2102b4f63d42'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
```
Here is one that breaks:

```
  <name>instance-00000240</name>
  <uuid>94b1af59-bece-4c90-9a77-5527561b978e</uuid>
...
    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
      <source>
        <address uuid='7f41921f-0683-4934-91e6-2102b4f63d42'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
```
Error message from nova-compute:

```
2019-01-23 12:18:09.690 7 ERROR nova.compute.manager [instance: 94b1af59-bece-4c90-9a77-5527561b978e] libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/7f41921f-0683-4934-91e6-2102b4f63d42 is in use by driver QEMU, domain instance-00000241
```

Revision history for this message
Mariusz Karpiarz (mkarpiarz) wrote :

As a workaround I set `max_concurrent_builds` to 1 in `nova.conf`.

Here is an example of what happens without this change when I launch 2 instances that get scheduled to the same hypervisor (the listing refers to methods from https://github.com/openstack/nova/blob/d5bde60e5680962394e263a662a2f331b6da93cd/nova/virt/libvirt/driver.py#L3089)

```
2019-02-26 12:24:59.363 Instance aeba3b4b-e8c9-499c-97e1-7468a6693ed1 in _allocate_mdevs(): checks which guests on the hypervisor have mdev devices attached, picks the first available device.
2019-02-26 12:24:59.795 Instance 41f5132e-1e99-4d8a-9be7-33ce1f5edbf2 in _allocate_mdevs(): picks the same mdev device as the previous instance does not exists in libvirt yet.

2019-02-26 12:25:00.474 Instance 41f5132e-1e99-4d8a-9be7-33ce1f5edbf2 in _get_guest_xml(): libvirt config, including the selected mediated device, is created for the instance.
2019-02-26 12:25:00.477 Instance 41f5132e-1e99-4d8a-9be7-33ce1f5edbf2 transition state to "spawning".
2019-02-26 12:25:00.584 Instance aeba3b4b-e8c9-499c-97e1-7468a6693ed1 in _get_guest_xml(): libvirt config, including the selected mediated device, is created for the instance.

2019-02-26 12:25:01.868 Instance aeba3b4b-e8c9-499c-97e1-7468a6693ed1 in _create_domain_and_network(): Instance fails with a libvirtError due to the mediated device being used by 41f5132e-1e99-4d8a-9be7-33ce1f5edbf2.
2019-02-26 12:25:04.752 Instance 41f5132e-1e99-4d8a-9be7-33ce1f5edbf2 successfully created on hypervisor.
```

Revision history for this message
Fan Zhang (fanzhang) wrote :

Any updates recently?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/723858

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

In Stein, we merged the ability to have multiple Resource Providers, each of them being a pGPU.
In Ussuri, we accepted to have a specific vGPU type per pGPU.

Now, I tested the above behaviour with https://review.opendev.org/723858 and it works now, unless you ask for a specific total capacity.

I'll close this bug that was only for libvirt vGPUs and please look at https://bugs.launchpad.net/nova/+bug/1874664 for the related issue.

Changed in nova:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/723858
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=32bbbd698a2a9c5ca6f0b01662d94c64e21422b1
Submitter: Zuul
Branch: master

commit 32bbbd698a2a9c5ca6f0b01662d94c64e21422b1
Author: Sylvain Bauza <email address hidden>
Date: Tue Apr 28 12:17:08 2020 +0200

    Test multi create with vGPUs

    We had a bug in Rocky where multicreate wasn't working correctly, but given
    in Stein we provided Resource Providers for each pGPU, this is fixed now.

    NOTE: We have a related bug #1874664 because multicreate doesn't work with
    nested Resource Providers.
    We could btw. move the regression test to a specific module in the
    regressions tests subdirectory.

    Change-Id: I8154917ff142987e80dc711e3b2b3965a21f08d0
    Related-Bug: #1780225
    Related-Bug: #1874664

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.