vGPU instance is not reattached to guest on resume

Bug #1948705 reported by Gustavo Santos
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Gustavo Santos

Bug Description

Description
===========
When suspending a guest with a vGPU instance attached to it, Nova asks libvirt to detach that device, since the guest can't be suspended otherwise. When resuming that guest, however, it comes back without the vGPU instance previously attached to it.

This happens because, at the time the feature was developed, libvirt did not support hot unplugging mediated devices [1]. That limitation has been lifted since then, and the resume operation needs to be modified in order to reattach the mediated device (vGPU instance) that was detached on suspend.

[1] https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/driver.py#L8003

Steps to reproduce
==================
1 - Set up the environment in order to provide vGPU capabilities to Nova;
2 - Spawn an instance using a flavor that asks for a vGPU;
3 - Suspend the instance;
4 - Resume the instance;

Expected result
===============
After resuming the instance, it should still have a vGPU instance attached.

Actual result
=============
The instance comes back without a vGPU attached.

Environment
===========
- Tested on StarlingX OpenStack - Ussuri; master code still does not provide a fix.
- Libvirt v4.7.0 + KVM.

Changed in nova:
assignee: nobody → Gustavo Santos (gooshtavow)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/815373

tags: added: vgpu
Changed in nova:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/815373
Committed: https://opendev.org/openstack/nova/commit/16f7c601b63bd1e7ca13917261300a7064ec72bc
Submitter: "Zuul (22348)"
Branch: master

commit 16f7c601b63bd1e7ca13917261300a7064ec72bc
Author: Gustavo Santos <email address hidden>
Date: Mon Oct 25 16:32:10 2021 -0300

    Reattach mdevs to guest on resume

    When suspending a VM in OpenStack, Nova detaches all the mediated
    devices from the guest machine, but does not reattach them on the resume
    operation. This patch makes Nova reattach the mdevs that were detached
    when the guest was suspended.

    This behavior is due to libvirt not supporting the hot-unplug of
    mediated devices at the time the feature was being developed. The
    limitation has been lifted since then, and now we have to amend the
    resume function so it will reattach the mediated devices that were
    detached on suspension.

    Closes-bug: #1948705

    Signed-off-by: Gustavo Santos <email address hidden>
    Change-Id: I083929f36d9e78bf7713a87cae6d581e0d946867

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/821126

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/821978

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/821980

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/821987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.0.0.0rc1

This issue was fixed in the openstack/nova 25.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/821126
Committed: https://opendev.org/openstack/nova/commit/15c32e89e4f5ad7823c490c976075280c5dfccd9
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 15c32e89e4f5ad7823c490c976075280c5dfccd9
Author: Gustavo Santos <email address hidden>
Date: Mon Oct 25 16:32:10 2021 -0300

    Reattach mdevs to guest on resume

    When suspending a VM in OpenStack, Nova detaches all the mediated
    devices from the guest machine, but does not reattach them on the resume
    operation. This patch makes Nova reattach the mdevs that were detached
    when the guest was suspended.

    This behavior is due to libvirt not supporting the hot-unplug of
    mediated devices at the time the feature was being developed. The
    limitation has been lifted since then, and now we have to amend the
    resume function so it will reattach the mediated devices that were
    detached on suspension.

    Closes-bug: #1948705

    Signed-off-by: Gustavo Santos <email address hidden>
    Change-Id: I083929f36d9e78bf7713a87cae6d581e0d946867
    (cherry picked from commit 16f7c601b63bd1e7ca13917261300a7064ec72bc)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.1.1

This issue was fixed in the openstack/nova 24.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/821978
Committed: https://opendev.org/openstack/nova/commit/b9feb05c2fd501b8695130402f2770b0f5d5cb6a
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit b9feb05c2fd501b8695130402f2770b0f5d5cb6a
Author: Gustavo Santos <email address hidden>
Date: Mon Oct 25 16:32:10 2021 -0300

    Reattach mdevs to guest on resume

    When suspending a VM in OpenStack, Nova detaches all the mediated
    devices from the guest machine, but does not reattach them on the resume
    operation. This patch makes Nova reattach the mdevs that were detached
    when the guest was suspended.

    This behavior is due to libvirt not supporting the hot-unplug of
    mediated devices at the time the feature was being developed. The
    limitation has been lifted since then, and now we have to amend the
    resume function so it will reattach the mediated devices that were
    detached on suspension.

    Changes:
      doc/source/admin/virtual-gpu.rst
    NOTE(elod.illes): updated the doc to reflect the new state.

    Closes-bug: #1948705

    Signed-off-by: Gustavo Santos <email address hidden>
    Change-Id: I083929f36d9e78bf7713a87cae6d581e0d946867
    (cherry picked from commit 16f7c601b63bd1e7ca13917261300a7064ec72bc)
    (cherry picked from commit 15c32e89e4f5ad7823c490c976075280c5dfccd9)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/821987
Reason: stable/ussuri branch of openstack/nova transitioned to End of Life and is about to be deleted. To be able to do that, all open patches need to be abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova wallaby-eom

This issue was fixed in the openstack/nova wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/821980
Reason: stable/victoria branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/victoria if you want to further work on this patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.