detach_interface doesn't take an instance.uuid and can race requests that do such as delete.

Bug #1914664 reported by Dan Smith
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Lee Yarwood

Bug Description

Seeing this failure in the gate:

https://zuul.opendev.org/t/openstack/build/7c71502b04fe47039b87f76fbe04fe56/log/controller/logs/screen-n-cpu.txt#33096

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [req-77f51485-cbc2-4f2a-8a6d-4a8ed910e585 req-a221d4f9-401e-420a-911e-8d32536a1d23 service nova] [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] detaching network adapter failed.: libvirt.libvirtError: internal error: End of file from qemu monitor

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] Traceback (most recent call last):

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2210, in detach_interface

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] wait_for_detach = guest.detach_device_with_retry(

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 423, in detach_device_with_retry

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] _try_detach_device(conf, persistent, live)

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 412, in _try_detach_device

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] ctx.reraise = True

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] self.force_reraise()

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] six.reraise(self.type_, self.value, self.tb)

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] raise value

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 398, in _try_detach_device

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] self.detach_device(conf, persistent=persistent, live=live)

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 473, in detach_device

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] self._domain.detachDeviceFlags(device_xml, flags=flags)

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] result = proxy_call(self._autowrap, f, *args, **kwargs)

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] rv = execute(f, *args, **kwargs)

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] six.reraise(c, e, tb)

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] raise value

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] rv = meth(*args, **kwargs)

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 1515, in detachDeviceFlags

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] raise libvirtError('virDomainDetachDeviceFlags() failed')

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431] libvirt.libvirtError: internal error: End of file from qemu monitor

Feb 04 20:54:32.857198 ubuntu-focal-limestone-regionone-0022873642 nova-compute[90163]: ERROR nova.virt.libvirt.driver [instance: 7174e97c-8cf4-46c7-9498-2c5dbc452431]

This was in a cinder-glance-multistore test, but I'm not sure it has anything to do with those. That test backs glance with cinder, but otherwise I wouldn't think nova would notice unless it was a failure to snapshot, use a volume, etc.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

This is the affected domain's log: https://zuul.opendev.org/t/openstack/build/7c71502b04fe47039b87f76fbe04fe56/log/controller/logs/libvirt/qemu/instance-0000004a_log.txt but it does not help.

The libvirtd log full of

qemuProcessHandleMonitorEOF:293 : Received EOF on 0x7f046c0de400 'instance-0000004a

lines but other than that I don't see anything helpful there either.

I cannot really put a signature on this as [1] shows mostly successful build as well as [2]

[1] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22raise%20libvirtError('virDomainDetachDeviceFlags()%20failed')%5C%22
[2] http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22libvirt.libvirtError%3A%20internal%20error%3A%20End%20of%20file%20from%20qemu%20monitor%5C%22

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The quota issue is tracked in a separate bug https://bugs.launchpad.net/glance/+bug/1914665

Lee Yarwood (lyarwood)
summary: - QEMU monitor read failure in ServerStableDeviceRescueTest
+ QEMU monitor read failure in AttachInterfacesTest
Revision history for this message
Lee Yarwood (lyarwood) wrote : Re: QEMU monitor read failure in AttachInterfacesTest

Thanks for pointing to that gibi.

Ignoring c#2 now, talking to Dan it looks like this is a race in the cleanup code for AttachInterfacesTestJSON where n-cpu was handling a request (req-77f51485-cbc2-4f2a-8a6d-4a8ed910e585) to detach an interface when another request (req-1547ac08-96e2-4878-a7d2-b71a33849aad) to delete the instance came in causing the monitor error as we destroyed the domain.

I wonder if detach_interface [1] should be taking an instance.uuid lock instead of an interface specific lock? We already do this for detach_volume [2] that would stop the same happening there when terminate_instance [3] is called, taking an instance.uuid lock.

[1] https://github.com/openstack/nova/blob/fec44e5d38baa0232bf41367303b82dc332eb512/nova/compute/manager.py#L7730-L7738
[2] https://github.com/openstack/nova/blob/fec44e5d38baa0232bf41367303b82dc332eb512/nova/compute/manager.py#L7145-L7165
[3] https://github.com/openstack/nova/blob/fec44e5d38baa0232bf41367303b82dc332eb512/nova/compute/manager.py#L3018-L3055

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

For me taking the instance lock feels like a good solution.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Marking this as Triaged as we now see a way forward.

Changed in nova:
status: New → Triaged
Lee Yarwood (lyarwood)
Changed in nova:
importance: Undecided → Medium
status: Triaged → Confirmed
tags: removed: gate-failure
summary: - QEMU monitor read failure in AttachInterfacesTest
+ detach_interface doesn't take an instance.uuid and can race requests
+ that do such as delete.
Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.