Detaching volume from the live domain: virDomainDetachDeviceFlags(): libvirt.libvirtError: operation failed: disk vdb not found

Bug #1931716 reported by Balazs Gibizer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Balazs Gibizer

Bug Description

Description
===========

The test_attach_attached_volume_to_same_server tests within the nova-next job fail during detaching the volume:

File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 1515, in detachDeviceFlags

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] raise libvirtError('virDomainDetachDeviceFlags() failed')

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] libvirt.libvirtError: operation failed: disk vdb not found

Steps to reproduce
==================

Only seen as part of the nova-next job at present.

Expected result
===============

detach succeeds

Actual result
=============

test fails as the volume state

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

master

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

libvirt

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

cinder volume

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

N/A

Logs & Configs
==============

https://zuul.opendev.org/t/openstack/build/02e6a99bf1574c978c663eb434705cbb/log/controller/logs/screen-n-cpu.txt?severity=0#34811

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [None req-aceedc3b-f03b-4bd5-8e8e-cfe66170a2e2 tempest-AttachVolumeNegativeTest-1273499975 tempest-AttachVolumeNegativeTest-1273499975-project] [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] Failed to detach volume 2939aa92-0156-45d6-9689-ed48dcc8fd8a from /dev/vdb: libvirt.libvirtError: operation failed: disk vdb not found

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] Traceback (most recent call last):

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/opt/stack/nova/nova/virt/block_device.py", line 328, in driver_detach

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] virt_driver.detach_volume(context, connection_info, instance, mp,

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2555, in detach_volume

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] self._detach_with_retry(

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2308, in _detach_with_retry

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] self._detach_from_live_with_retry(

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2364, in _detach_from_live_with_retry

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] self._detach_from_live_and_wait_for_event(

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2426, in _detach_from_live_and_wait_for_event

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] self._detach_sync(

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2498, in _detach_sync

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] guest.detach_device(dev, persistent=persistent, live=live)

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 441, in detach_device

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] self._domain.detachDeviceFlags(device_xml, flags=flags)

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] result = proxy_call(self._autowrap, f, *args, **kwargs)

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] rv = execute(f, *args, **kwargs)

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR

nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] six.reraise(c, e, tb)

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] raise value

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] rv = meth(*args, **kwargs)

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 1515, in detachDeviceFlags

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] raise libvirtError('virDomainDetachDeviceFlags() failed')

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5] libvirt.libvirtError: operation failed: disk vdb not found

Jun 11 13:30:47.071650 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: ERROR nova.virt.block_device [instance: e3d6fd2b-7357-4d19-bcb4-482d97d332e5]

Tags: gate-failure
tags: added: gate-failure
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Download full text (3.8 KiB)

The testcase:
1) attaches a volume
2) tries to attach the same volume again and expect it to fail
3) cleanup the resources used in the test, first it tries to detach the volume

From the log it is visible that the #1) attach and the #3) detach happens right after each other from the libvirt driver perspective

Jun 11 13:30:46.398669 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: DEBUG oslo_concurrency.lockutils [None req-382bb669-3d84-4d09-9b01-870ab132642e tempest-AttachVolumeNegativeTest-1273499975 tempest-AttachVolumeNegativeTest-1273499975-project] Lock "e3d6fd2b-7357-4d19-bcb4-482d97d332e5" released by "nova.compute.manager.ComputeManager.attach_volume.<locals>.do_attach_volume" :: held 6.947s {{(pid=108182) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:367}}
Jun 11 13:30:46.906588 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: DEBUG oslo_concurrency.lockutils [None req-aceedc3b-f03b-4bd5-8e8e-cfe66170a2e2 tempest-AttachVolumeNegativeTest-1273499975 tempest-AttachVolumeNegativeTest-1273499975-project] Lock "e3d6fd2b-7357-4d19-bcb4-482d97d332e5" acquired by "nova.compute.manager.ComputeManager.detach_volume.<locals>.do_detach_volume" :: waited 0.000s {{(pid=108182) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:355}}

During detach:
1) the volume is found both in the persistent and in the live domain,
2) detached from the persistent domain successfully
3) failed to detach from the live domain as the device was not found.

So somehow the device disappeared from the live domain between 1) and 3)

Jun 11 13:30:47.028442 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: DEBUG nova.virt.libvirt.driver [None req-aceedc3b-f03b-4bd5-8e8e-cfe66170a2e2 tempest-AttachVolumeNegativeTest-1273499975 tempest-AttachVolumeNegativeTest-1273499975-project] Attempting to detach device vdb from instance e3d6fd2b-7357-4d19-bcb4-482d97d332e5 from the persistent domain config. {{(pid=108182) _detach_from_persistent /opt/stack/nova/nova/virt/libvirt/driver.py:2322}}
Jun 11 13:30:47.045014 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: WARNING nova.virt.libvirt.driver [None req-171cb636-ab52-4632-a7b3-791246cae8f4 None None] Received event <DeviceRemovedEvent: 1623418247.0404575, e3d6fd2b-7357-4d19-bcb4-482d97d332e5 => virtio-disk1> from libvirt but the driver is not waiting for it; ignored.
Jun 11 13:30:47.049596 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: INFO nova.virt.libvirt.driver [None req-aceedc3b-f03b-4bd5-8e8e-cfe66170a2e2 tempest-AttachVolumeNegativeTest-1273499975 tempest-AttachVolumeNegativeTest-1273499975-project] Successfully detached device vdb from instance e3d6fd2b-7357-4d19-bcb4-482d97d332e5 from the persistent domain config.
Jun 11 13:30:47.050139 ubuntu-focal-inap-mtl01-0025074130 nova-compute[108182]: DEBUG nova.virt.libvirt.driver [None req-aceedc3b-f03b-4bd5-8e8e-cfe66170a2e2 tempest-AttachVolumeNegativeTest-1273499975 tempest-AttachVolumeNegativeTest-1273499975-project] (1/8): Attempting to detach device vdb with device alias virtio-disk1 from instance e3d6fd2b-7357-4d19-bcb4-482d97d332e5 from the live domain config. {{(pid=108182...

Read more...

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

This is the libvirt log around the time

2021-06-11 13:30:47.026+0000: 72826: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"timestamp": {"seconds": 1623418247, "microseconds": 26110}, "event": "DEVICE_DELETED", "data": {"device"
: "virtio-disk1", "path": "/machine/peripheral/virtio-disk1"}}]
2021-06-11 13:30:47.026+0000: 72826: info : qemuMonitorJSONIOProcessLine:234 : QEMU_MONITOR_RECV_EVENT: mon=0x7f9460079750 event={"timestamp": {"seconds": 1623418247, "microseconds": 26110}, "
event": "DEVICE_DELETED", "data": {"device": "virtio-disk1", "path": "/machine/peripheral/virtio-disk1"}}
2021-06-11 13:30:47.026+0000: 72826: debug : qemuMonitorJSONIOProcessEvent:181 : mon=0x7f9460079750 obj=0x5628e9e75bd0
2021-06-11 13:30:47.026+0000: 72826: debug : qemuMonitorEmitEvent:1198 : mon=0x7f9460079750 event=DEVICE_DELETED
2021-06-11 13:30:47.026+0000: 72826: debug : qemuProcessHandleEvent:549 : vm=0x7f946005e590
2021-06-11 13:30:47.026+0000: 72826: debug : virObjectEventNew:631 : obj=0x5628e9ea3460
2021-06-11 13:30:47.026+0000: 72826: debug : qemuMonitorJSONIOProcessEvent:205 : handle DEVICE_DELETED handler=0x7f94743f6b30 data=0x5628e9e6e360
2021-06-11 13:30:47.026+0000: 72826: debug : qemuMonitorEmitDeviceDeleted:1432 : mon=0x7f9460079750
2021-06-11 13:30:47.026+0000: 72826: debug : qemuProcessHandleDeviceDeleted:1362 : Device virtio-disk1 removed from domain 0x7f946005e590 instance-00000076
2021-06-11 13:30:47.026+0000: 119305: debug : qemuProcessEventHandler:4866 : vm=0x7f946005e590, event=2
2021-06-11 13:30:47.026+0000: 119305: debug : processDeviceDeletedEvent:4282 : Removing device virtio-disk1 from domain 0x7f946005e590 instance-00000076
2021-06-11 13:30:47.026+0000: 119305: debug : qemuDomainObjBeginJobInternal:9416 : Starting job: job=modify agentJob=none asyncJob=none (vm=0x7f946005e590 name=instance-00000076, current job=n
one agentJob=none async=none)
2021-06-11 13:30:47.026+0000: 119305: debug : qemuDomainObjBeginJobInternal:9470 : Started job: modify (async=none vm=0x7f946005e590 name=instance-00000076)
2021-06-11 13:30:47.026+0000: 119305: debug : qemuDomainRemoveDiskDevice:4218 : Removing disk virtio-disk1 from domain 0x7f946005e590 instance-00000076
2021-06-11 13:30:47.026+0000: 119305: debug : qemuDomainObjEnterMonitorInternal:9869 : Entering monitor (mon=0x7f9460079750 vm=0x7f946005e590 name=instance-00000076)
2021-06-11 13:30:47.026+0000: 119305: debug : qemuMonitorBlockdevDel:4417 : nodename=libvirt-4-format
2021-06-11 13:30:47.026+0000: 119305: debug : qemuMonitorBlockdevDel:4419 : mon:0x7f9460079750 vm:0x7f946005e590 fd:35
2021-06-11 13:30:47.026+0000: 119305: info : qemuMonitorSend:993 : QEMU_MONITOR_SEND_MSG: mon=0x7f9460079750 msg={"execute":"blockdev-del","arguments":{"node-name":"libvirt-4-format"},"id":"li
bvirt-378"}
 fd=-1
2021-06-11 13:30:47.026+0000: 72826: debug : virObjectEventDispose:124 : obj=0x5628e9ea3460
2021-06-11 13:30:47.026+0000: 72826: info : qemuMonitorIOWrite:450 : QEMU_MONITOR_IO_WRITE: mon=0x7f9460079750 buf={"execute":"blockdev-del","arguments":{"node-name":"libvirt-4-format"},"id":"
libvirt-378"}
 len=92 ret=92 errno=0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/796043

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Nova logs

Unexpected libvirt error while detaching device vdb from instance e3d6fd2b-7357-4d19-bcb4-482d97d332e5: operation failed: disk vdb not found: libvirt.libvirtError: operation failed: disk vdb not found

This means the code hit an unrecognized error[1]. It is not DEVICE_MISSING (99)[2]. Even thought the error message suggest that.

I will push a patch that logs the error code so we can add that to the handled errors.

[1] https://github.com/openstack/nova/blob/245433b826cc7178512c913df59caa9e73895f60/nova/virt/libvirt/driver.py#L2540
[2] https://github.com/openstack/nova/blob/245433b826cc7178512c913df59caa9e73895f60/nova/virt/libvirt/driver.py#L2502

Revision history for this message
melanie witt (melwitt) wrote (last edit ):

TBH "operation failed: disk vdb not found: libvirt.libvirtError: operation failed: disk vdb not found" sounds like one of the old error codes that we removed in the recent-ish past. Old code [1]:

    except libvirt.libvirtError as ex:
        with excutils.save_and_reraise_exception(reraise=False) as ctx:
            errcode = ex.get_error_code()
            # TODO(lyarwood): Remove libvirt.VIR_ERR_OPERATION_FAILED
            # and libvirt.VIR_ERR_INTERNAL_ERROR once
            # MIN_LIBVIRT_VERSION is >= 4.1.0
            if supports_device_missing_error_code:
                unplug_libvirt_error_codes.add(
                    libvirt.VIR_ERR_DEVICE_MISSING)
            if errcode in unplug_libvirt_error_codes:
                # TODO(lyarwood): Remove the following error message
                # check once we only care about VIR_ERR_DEVICE_MISSING
                errmsg = ex.get_error_message()
                if 'not found' in errmsg:
                    # This will be raised if the live domain
                    # detach fails because the device is not found
                    raise exception.DeviceNotFound(
                        device=alternative_device_name)

[1] https://github.com/openstack/nova/blob/dd1e6d4b0cee465fd89744e306fcd25228b3f7cc/nova/virt/libvirt/guest.py#L417-L427

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Lee found the codepath in libvirt that still uses OPERATION_FAILED https://listman.redhat.com/archives/libvir-list/2021-June/msg00337.html

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/796255

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Medium
assignee: nobody → Balazs Gibizer (balazs-gibizer)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/796043
Committed: https://opendev.org/openstack/nova/commit/7c7682149940465acb3d9c5023262af97886055c
Submitter: "Zuul (22348)"
Branch: master

commit 7c7682149940465acb3d9c5023262af97886055c
Author: Balazs Gibizer <email address hidden>
Date: Fri Jun 11 18:10:12 2021 +0200

    Add debug log for device detach libvirt error

    We see a new way of virDomainDetachDeviceFlags() to fail in bug 1931716.
    This patch logs the libvirt error code and error message.

    Change-Id: I7ec4ea68a933d1cb415c4adee97a508cf9611468
    Related-Bug: #1931716

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/796255
Committed: https://opendev.org/openstack/nova/commit/eb67ec24e31d79749f2fa515fa69d861dfb12244
Submitter: "Zuul (22348)"
Branch: master

commit eb67ec24e31d79749f2fa515fa69d861dfb12244
Author: Balazs Gibizer <email address hidden>
Date: Mon Jun 14 12:29:18 2021 +0200

    Handle OPERATION_FAILED error during detach

    While libvirt 4.1 started to use DEVICE_MISSING for most of the cases
    there is still one[1] using OPERATION_FAILED when the device is missing
    during detach. So nova needs to handle that properly.

    Revert this when libvirt merges[1] and nova bumps the min version over
    to that release.

    [1] https://listman.redhat.com/archives/libvir-list/2021-June/msg00337.html

    Change-Id: Ibc967be291853e23027d9b8210ece572adcc80c9
    Closes-Bug: #1931716

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.0.0.0rc1

This issue was fixed in the openstack/nova 24.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.