Asking for different vGPU types is racey

Bug #1900006 reported by Sylvain Bauza
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Sylvain Bauza
Ussuri
Fix Released
Undecided
Unassigned
Victoria
Fix Released
Medium
Unassigned

Bug Description

When testing on Victoria virtual GPUs, I wanted to have different types :

[devices]
enabled_vgpu_types = nvidia-320,nvidia-321

[vgpu_nvidia-320]
device_addresses = 0000:04:02.1,0000:04:02.2

[vgpu_nvidia-321]
device_addresses = 0000:04:02.3

Unfortunately, I saw that only the first type was used.
When restarting the nova-compute service, we got the log :
WARNING nova.virt.libvirt.driver [None req-a23d9cb4-6554-499c-9fcf-d7f9706535ef None None] The vGPU type 'nvidia-320' was listed in '[devices] enabled_vgpu_types' but no corresponding '[vgpu_nvidia-320]' group or '[vgpu_nvidia-320] device_addresses' option was defined. Only the first type 'nvidia-320' will be used.

It's due to the fact that we call _get_supported_vgpu_types() first when creating the libvirt implementation [1] while we only register the new CONF options by init_host() [2] which is called after.

[1] https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/libvirt/driver.py#L418

[2] https://github.com/openstack/nova/blob/90777d7/nova/compute/manager.py#L1405

A simple fix would just be to make sure we have dynamic options within _get_supported_vgpu_types()

Tags: libvirt vgpu
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/758470

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/758470
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2bd8900d9b2600fba74341e249701051fb78eb37
Submitter: Zuul
Branch: master

commit 2bd8900d9b2600fba74341e249701051fb78eb37
Author: Sylvain Bauza <email address hidden>
Date: Thu Oct 15 19:19:38 2020 +0200

    Fix the vGPU dynamic options race

    As we lookup the existing dynamic options *before* creating them as
    _get_supported_vgpu_types() is called *before* compute init_host(),
    we need to make sure we call again the dynamic options registration
    within it.

    Change-Id: Ib9387c381d39fac389374c731b210815c6d4af03
    Closes-Bug: #1900006

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.0.0.0rc1

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 22.2.1

This issue was fixed in the openstack/nova 22.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/831524

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/nova/+/831524
Committed: https://opendev.org/openstack/nova/commit/2ddc2e6ab0259fdc1437998542fb5fd020dfef31
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 2ddc2e6ab0259fdc1437998542fb5fd020dfef31
Author: Sylvain Bauza <email address hidden>
Date: Thu Oct 15 19:19:38 2020 +0200

    Fix the vGPU dynamic options race

    As we lookup the existing dynamic options *before* creating them as
    _get_supported_vgpu_types() is called *before* compute init_host(),
    we need to make sure we call again the dynamic options registration
    within it.

    Change-Id: Ib9387c381d39fac389374c731b210815c6d4af03
    Closes-Bug: #1900006
    (cherry picked from commit 2bd8900d9b2600fba74341e249701051fb78eb37)
    (cherry picked from commit c7d9d6d9dd25e21ec76ceea294cdf1690686a086)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova ussuri-eol

This issue was fixed in the openstack/nova ussuri-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.