libvirt.libvirtError: internal error: unable to execute QEMU command 'device_del': Device $device is already in the process of unplug

Bug #1923206 reported by Lee Yarwood
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Lee Yarwood
Wallaby
Fix Released
Undecided
Unassigned

Bug Description

Description
===========
This was initially reported downstream against QEMU in the following bug:

Get libvirtError "Device XX is already in the process of unplug" when detach device in OSP env
https://bugzilla.redhat.com/show_bug.cgi?id=1878659

I first saw the error crop up while testing q35 in TripleO in the following job:

https://c6b36562677324bf8249-804f3f4695b3063292bbb3235f424ae0.ssl.cf1.rackcdn.com/785027/5/check/tripleo-ci-centos-8-standalone/6860050/logs/undercloud/var/log/containers/nova/nova-compute.log

2021-04-09 11:09:53.702 8 DEBUG nova.virt.libvirt.guest [req-4d0b64d5-a2cf-4a6e-a2f7-f6cc7ced4df1 7e2b737ed8f04b3ca819841a41be66c1 d4d933c7b10c462c8141820b0e70822b - default default] Attempting initial detach for device vdb detach_device_with_retry /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:455
[..]
2021-04-09 11:09:58.721 8 DEBUG nova.virt.libvirt.guest [req-4d0b64d5-a2cf-4a6e-a2f7-f6cc7ced4df1 7e2b737ed8f04b3ca819841a41be66c1 d4d933c7b10c462c8141820b0e70822b - default default] Start retrying detach until device vdb is gone. detach_device_with_retry /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:471
[..]
2021-04-09 11:09:58.729 8 ERROR oslo.service.loopingcall libvirt.libvirtError: internal error: unable to execute QEMU command 'device_del': Device virtio-disk1 is already in the process of unplug

Steps to reproduce
==================
Unclear at present, it looks like a genuine QEMU bug that causes it to fail when a repeat request to device_del a device comes in instead of ignore the request as would previously happen. I've asked for clarification in the downstream QEMU bug.

Expected result
===============
Repeat calls to device_del are ignored or the failure while raised is ignored by Nova.

Actual result
=============
Repeat calls to device_del lead to an error being raised to Nova via libvirt that causes the detach to fail while it still succeeds asynchronously within QEMU.

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

   master

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

   libvirt + QEMU/KVM

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

   N/A

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

   N/A

Logs & Configs
==============
See above.

Revision history for this message
melanie witt (melwitt) wrote :
Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/785682
Committed: https://opendev.org/openstack/nova/commit/0a7d3794c6dc39976b4cbfe12b1688230ac895a8
Submitter: "Zuul (22348)"
Branch: master

commit 0a7d3794c6dc39976b4cbfe12b1688230ac895a8
Author: Lee Yarwood <email address hidden>
Date: Fri Apr 9 15:37:23 2021 +0100

    libvirt: Ignore device already in the process of unplug errors

    At present QEMU will raise an error to libvirt when a device_del request
    is made for a device that has already partially detached through a
    previous request. This is outlined in more detail in the following
    downstream Red Hat QEMU bug report:

    Get libvirtError "Device XX is already in the process of unplug" [..]
    https://bugzilla.redhat.com/show_bug.cgi?id=1878659

    Within Nova we can actually ignore this error and allow our existing
    retry logic to attempt again after a short wait, hopefully allowing the
    original request to complete removing the device from the domain.

    This change does this and should result in one of the following
    device_del requests raising a VIR_ERR_DEVICE_MISSING error from libvirt.
    _try_detach_device should then translate that libvirt error into a
    DeviceNotFound exception which is itself then ignored by all
    detach_device_with_retry callers and taken to mean that the device has
    detached successfully.

    Closes-Bug: #1923206
    Change-Id: I0e068043d8267ab91535413d950a3e154c2234f7

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/786483

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/786483
Committed: https://opendev.org/openstack/nova/commit/972a86d61f6b6f0f3d1af549b081854e6ff016bc
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 972a86d61f6b6f0f3d1af549b081854e6ff016bc
Author: Lee Yarwood <email address hidden>
Date: Fri Apr 9 15:37:23 2021 +0100

    libvirt: Ignore device already in the process of unplug errors

    At present QEMU will raise an error to libvirt when a device_del request
    is made for a device that has already partially detached through a
    previous request. This is outlined in more detail in the following
    downstream Red Hat QEMU bug report:

    Get libvirtError "Device XX is already in the process of unplug" [..]
    https://bugzilla.redhat.com/show_bug.cgi?id=1878659

    Within Nova we can actually ignore this error and allow our existing
    retry logic to attempt again after a short wait, hopefully allowing the
    original request to complete removing the device from the domain.

    This change does this and should result in one of the following
    device_del requests raising a VIR_ERR_DEVICE_MISSING error from libvirt.
    _try_detach_device should then translate that libvirt error into a
    DeviceNotFound exception which is itself then ignored by all
    detach_device_with_retry callers and taken to mean that the device has
    detached successfully.

    Closes-Bug: #1923206
    Change-Id: I0e068043d8267ab91535413d950a3e154c2234f7
    (cherry picked from commit 0a7d3794c6dc39976b4cbfe12b1688230ac895a8)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/788467

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/788468

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/788469

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.0.1

This issue was fixed in the openstack/nova 23.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/c/openstack/nova/+/793044

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.0.0.0rc1

This issue was fixed in the openstack/nova 24.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/stein)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/stein
Review: https://review.opendev.org/c/openstack/nova/+/793044
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/788469
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/788468
Reason: stable/ussuri branch of openstack/nova transitioned to End of Life and is about to be deleted. To be able to do that, all open patches need to be abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/788467
Reason: stable/victoria branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/victoria if you want to further work on this patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.