Description
===========
# This might not be a nova bug but I'm wondering whether anyone in the team has any idea about $topic.
Currently some tests in heat CI are consistently failing. Looking at the failures, we found detaching a port from a interface does not progress.
example: https://zuul.opendev.org/t/openstack/build/301ed642a2374caf9a4f807952702a6a
~~~
Nov 29 10:14:44.823945 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Attempting to detach device tape01452d4-15 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the persistent domain config. {{(pid=81163) _detach_from_persistent /opt/stack/nova/nova/virt/libvirt/driver.py:2445}}
Nov 29 10:14:44.824612 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] detach device xml: <interface type="ethernet">
Nov 29 10:14:44.829211 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] looking for interface given config: <interface type="ethernet"><mac address="fa:16:3e:21:bd:e7"/><model type="virtio"/><driver name="qemu"/><mtu size="1372"/><target dev="tape01452d4-15"/></interface> {{(pid=81163) get_interface_by_cfg /opt/stack/nova/nova/virt/libvirt/guest.py:257}}
Nov 29 10:14:44.832316 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] No interface of type: <class 'nova.virt.libvirt.config.LibvirtConfigGuestInterface'> found in domain {{(pid=81163) get_interface_by_cfg /opt/stack/nova/nova/virt/libvirt/guest.py:261}}
Nov 29 10:14:44.832606 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: INFO nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Successfully detached device tape01452d4-15 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the persistent domain config.
Nov 29 10:14:44.832978 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] (1/8): Attempting to detach device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. {{(pid=81163) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2481}}
Nov 29 10:14:44.833403 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] detach device xml: <interface type="ethernet">
Nov 29 10:14:49.838217 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Start waiting for the detach event from libvirt for device tape01452d4-15 with device alias net0 for instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 {{(pid=81163) _detach_from_live_and_wait_for_event /opt/stack/nova/nova/virt/libvirt/driver.py:2557}}
Nov 29 10:15:09.840289 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: WARNING nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Waiting for libvirt event about the detach of device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 is timed out.
Nov 29 10:15:09.841061 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] looking for interface given config: <interface type="ethernet"><mac address="fa:16:3e:21:bd:e7"/><model type="virtio"/><driver name="qemu"/><mtu size="1372"/><target dev="tape01452d4-15"/></interface> {{(pid=81163) get_interface_by_cfg /opt/stack/nova/nova/virt/libvirt/guest.py:257}}
Nov 29 10:15:09.846547 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Failed to detach device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. Libvirt did not report any error but the device is still in the config. {{(pid=81163) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2499}}
Nov 29 10:15:09.846792 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] (2/8): Attempting to detach device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. {{(pid=81163) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2481}}
Nov 29 10:15:09.847211 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] detach device xml: <interface type="ethernet">
Nov 29 10:15:09.851474 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Libvirt returned error while detaching device tape01452d4-15 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733. Libvirt error code: 1, error message: internal error: unable to execute QEMU command 'device_del': Device net0 is already in the process of unplug. {{(pid=81163) _detach_sync /opt/stack/nova/nova/virt/libvirt/driver.py:2625}}
...
Nov 29 10:17:09.915817 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Ignoring QEMU rejecting our request to detach device tape01452d4-15 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 as it is caused by a previous request still being in progress. {{(pid=81163) _detach_sync /opt/stack/nova/nova/virt/libvirt/driver.py:2662}}
Nov 29 10:17:09.916105 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Start waiting for the detach event from libvirt for device tape01452d4-15 with device alias net0 for instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 {{(pid=81163) _detach_from_live_and_wait_for_event /opt/stack/nova/nova/virt/libvirt/driver.py:2557}}
Nov 29 10:17:29.917420 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: WARNING nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Waiting for libvirt event about the detach of device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 is timed out.
Nov 29 10:17:29.918025 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.guest [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] looking for interface given config: <interface type="ethernet"><mac address="fa:16:3e:21:bd:e7"/><model type="virtio"/><driver name="qemu"/><mtu size="1372"/><target dev="tape01452d4-15"/></interface> {{(pid=81163) get_interface_by_cfg /opt/stack/nova/nova/virt/libvirt/guest.py:257}}
Nov 29 10:17:29.920324 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: DEBUG nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Failed to detach device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. Libvirt did not report any error but the device is still in the config. {{(pid=81163) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2499}}
Nov 29 10:17:29.920593 ubuntu-jammy-rax-dfw-0032320806 nova-compute[81163]: ERROR nova.virt.libvirt.driver [None req-3a1d745d-ca71-4dd3-9f75-d9adac3c5cdb demo demo] Run out of retry while detaching device tape01452d4-15 with device alias net0 from instance 5cf9dfbe-e6cd-414b-a868-888aef23f733 from the live domain config. Device is still attached to the guest.
~~~
The port was previously attached to 4bd20bdd-e186-480b-adb0-ee40f3d78ffb , and nova succeeded to detach it from that instance so this might be intermittent issue rather than consistent one, though the job has been blocked for recent multiple attempts.
Steps to reproduce
==================
* (TBD)
Expected result
===============
* Interface detach completes
Actual result
=============
* Interface detach does not progress and times out
Environment
===========
* Current master + Ubuntu Jammy
* ML2 + OVN is used
Logs & Configs
==============
I've raised the device_ detach_ attempts/ timeout options and seems that eases the problem.
However now I face a different exception which looks legit.
https:/ /zuul.opendev. org/t/openstack /build/ 71c5de87297549c 296ca58b720105a 30
~~~ jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server [None req-2c96ae71- 9987-4e9c- 97e7-0890244d8e 63 demo demo] Exception during message handling: KeyError: AsyncDeviceEven tsHandler. Waiter( instance_ uuid=e9fa62fb- 46fb-4bfd- a22c-3dc98a88b0 e6, device_name=net0, event_types={<class 'nova.virt. libvirt. event.DeviceRem ovalFailedEvent '>, <class 'nova.virt. libvirt. event.DeviceRem ovedEvent' >}) jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server Traceback (most recent call last): jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server File "/opt/stack/ oslo.messaging/ oslo_messaging/ rpc/server. py", line 165, in _process_incoming jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server res = self.dispatcher .dispatch( message) jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server File "/opt/stack/ oslo.messaging/ oslo_messaging/ rpc/dispatcher. py", line 309, in dispatch jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server return self._do_ dispatch( endpoint, method, ctxt, args) jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server File "/opt/stack/ oslo.messaging/ oslo_messaging/ rpc/dispatcher. py", line 229, in _do_dispatch jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server result = func(ctxt, **new_args) jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server File "/opt/stack/ nova/nova/ exception_ wrapper. py", line 65, in wrapped jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server with excutils. save_and_ reraise_ exception( ): jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server File "/usr/local/ lib/python3. 10/dist- packages/ oslo_utils/ excutils. py", line 227, in __exit__ jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server self.force_ reraise( ) jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server File "/usr/local/ lib/python3. 10/dist- packages/ oslo_utils/ excutils. py", line 200, in force_reraise jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server raise self.value jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ERROR oslo_messaging. rpc.server File "/opt/stack/ nova/nova/ exception_ wrapper. py", line 63, in wrapped jammy-iweb- mtl01-003233752 6 nova-compute[ 80934]: ER...
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-
Nov 30 02:30:04.180887 ubuntu-