interface detach does not progress because libvirt does not complete the operation

Bug #1998274 reported by Takashi Kajinami
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Undecided
Unassigned

Bug Description

Description
===========
# This might not be a nova bug but I'm wondering whether anyone in the team has any idea about $topic.

Currently some tests in heat CI are consistently failing. Looking at the failures, we found detaching a port from a interface does not progress.

example: https://zuul.opendev.org/t/openstack/build/301ed642a2374caf9a4f807952702a6a

~~~
Nov 29 10:14:44.823945 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Attempting to detach device tape01452d4-15 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the persistent domain config. {{(pid=81163) _detach_from_persistent /opt/stack/nova/nova/virt/libvirt/driver.py:2445}}
Nov 29 10:14:44.824612 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] detach device xml: <interface type="ethernet">
Nov 29 10:14:44.829211 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] looking for interface given config: <interface type="ethernet"><mac address="fa:16:3e:21:bd:e7"/><model type="virtio"/><driver name="qemu"/><mtu size="1372"/><target dev="tape01452d4-15"/></interface> {{(pid=81163) get_interface_by_cfg /opt/stack/nova/nova/virt/libvirt/guest.py:257}}
Nov 29 10:14:44.832316 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] No interface of type: <class 'nova.virt.libvirt.config.LibvirtConfigGuestInterface'> found in domain {{(pid=81163) get_interface_by_cfg /opt/stack/nova/nova/virt/libvirt/guest.py:261}}
Nov 29 10:14:44.832606 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: INFO nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Successfully detached device tape01452d4-15 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the persistent domain config.
Nov 29 10:14:44.832978 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] (1/8): Attempting to detach device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. {{(pid=81163) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2481}}
Nov 29 10:14:44.833403 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] detach device xml: <interface type="ethernet">
Nov 29 10:14:49.838217 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Start waiting for the detach event from libvirt for device tape01452d4-15 with device alias net0 for instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 {{(pid=81163) _detach_from_live_and_wait_for_event /opt/stack/nova/nova/virt/libvirt/driver.py:2557}}
Nov 29 10:15:09.840289 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: WARNING nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Waiting for libvirt event about the detach of device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 is timed out.
Nov 29 10:15:09.841061 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] looking for interface given config: <interface type="ethernet"><mac address="fa:16:3e:21:bd:e7"/><model type="virtio"/><driver name="qemu"/><mtu size="1372"/><target dev="tape01452d4-15"/></interface> {{(pid=81163) get_interface_by_cfg /opt/stack/nova/nova/virt/libvirt/guest.py:257}}
Nov 29 10:15:09.846547 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Failed to detach device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. Libvirt did not report any error but the device is still in the config. {{(pid=81163) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2499}}
Nov 29 10:15:09.846792 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] (2/8): Attempting to detach device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. {{(pid=81163) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2481}}
Nov 29 10:15:09.847211 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] detach device xml: <interface type="ethernet">
Nov 29 10:15:09.851474 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Libvirt returned error while detaching device tape01452d4-15 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733. Libvirt error code: 1, error message: internal error: unable to execute QEMU command 'device_del': Device net0 is already in the process of unplug. {{(pid=81163) _detach_sync /opt/stack/nova/nova/virt/libvirt/driver.py:2625}}
...
Nov 29 10:17:09.915817 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Ignoring QEMU rejecting our request to detach device tape01452d4-15 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 as it is caused by a previous request still being in progress. {{(pid=81163) _detach_sync /opt/stack/nova/nova/virt/libvirt/driver.py:2662}}
Nov 29 10:17:09.916105 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Start waiting for the detach event from libvirt for device tape01452d4-15 with device alias net0 for instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 {{(pid=81163) _detach_from_live_and_wait_for_event /opt/stack/nova/nova/virt/libvirt/driver.py:2557}}
Nov 29 10:17:29.917420 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: WARNING nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Waiting for libvirt event about the detach of device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 is timed out.
Nov 29 10:17:29.918025 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] looking for interface given config: <interface type="ethernet"><mac address="fa:16:3e:21:bd:e7"/><model type="virtio"/><driver name="qemu"/><mtu size="1372"/><target dev="tape01452d4-15"/></interface> {{(pid=81163) get_interface_by_cfg /opt/stack/nova/nova/virt/libvirt/guest.py:257}}
Nov 29 10:17:29.920324 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Failed to detach device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. Libvirt did not report any error but the device is still in the config. {{(pid=81163) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2499}}
Nov 29 10:17:29.920593 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: ERROR nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Run out of retry while detaching device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. Device is still attached to the guest.
~~~

The port was previously attached to 4bd20bdd-e186-480b-adb0-ee40f3d78ffb , and nova succeeded to detach it from that instance so this might be intermittent issue rather than consistent one, though the job has been blocked for recent multiple attempts.

Steps to reproduce
==================
* (TBD)

Expected result
===============
* Interface detach completes

Actual result
=============
* Interface detach does not progress and times out

Environment
===========
* Current master + Ubuntu Jammy
* ML2 + OVN is used

Logs & Configs
==============

Revision history for this message
Takashi Kajinami (kajinamit) wrote (last edit ):
Download full text (8.5 KiB)

I've raised the device_detach_attempts/timeout options and seems that eases the problem.

However now I face a different exception which looks legit.

https://zuul.opendev.org/t/openstack/build/71c5de87297549c296ca58b720105a30

~~~
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server [None req-2c96ae71-9987-4e9c-97e7-0890244d8e63 demo demo] Exception during message handling: KeyError: AsyncDeviceEventsHandler.Waiter(instance_uuid=e9fa62fb-46fb-4bfd-a22c-3dc98a88b0e6, device_name=net0, event_types={<class 'nova.virt.libvirt.event.DeviceRemovalFailedEvent'>, <class 'nova.virt.libvirt.event.DeviceRemovedEvent'>})
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server File "/opt/stack/oslo.messaging/oslo_messaging/rpc/server.py", line 165, in _process_incoming
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server File "/opt/stack/oslo.messaging/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server File "/opt/stack/oslo.messaging/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 65, in wrapped
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server with excutils.save_and_reraise_exception():
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.10/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server self.force_reraise()
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.10/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server raise self.value
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 63, in wrapped
Nov 30 02:30:04.180887 ubuntu-jammy-iweb-mtl01-0032337526 nova-compute[80934]: ER...

Read more...

Revision history for this message
Elod Illes (elod-illes) wrote :

Hi Takashi, I think this is duplicate of https://bugs.launchpad.net/nova/+bug/1998148

Revision history for this message
Takashi Kajinami (kajinamit) wrote :

Yeah. I agree this is likely the same libvirt issue which is preventing device detach

(this bug is an issue with network interface detach while that one is one with volume detach, which are processed similarly by libvirt, I guess)

I'm ok with closing this as a duplicate of this bug, but if it does not harm then I'd prefer keeping this bug open until one of these two are sorted (so that we don't forget to ensure the solution for one bug works for the other.)

Revision history for this message
Uggla (rene-ribaud) wrote :

Mark it at won't fix because it is a duplicate bug.

Changed in nova:
status: New → Confirmed
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.