Traceback from "detaching network adapter failed.: libvirtError: internal error: End of file from qemu monitor" when network-vif-delete races with server delete

Bug #1797966 reported by Matt Riedemann
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
John Garbutt
Pike
Triaged
Medium
Unassigned
Queens
Triaged
Medium
Unassigned
Rocky
Triaged
Medium
Unassigned

Bug Description

Seen here:

http://logs.openstack.org/98/610098/2/check/tempest-full/e51c5d8/testr_results.html.gz

ft1.1: setUpClass (tempest.api.compute.servers.test_create_server.ServersTestJSON)_StringException: Traceback (most recent call last):
  File "tempest/test.py", line 172, in setUpClass
    six.reraise(etype, value, trace)
  File "tempest/test.py", line 165, in setUpClass
    cls.resource_setup()
  File "tempest/api/compute/servers/test_create_server.py", line 64, in resource_setup
    volume_backed=cls.volume_backed)
  File "tempest/api/compute/base.py", line 246, in create_test_server
    **kwargs)
  File "tempest/common/compute.py", line 258, in create_test_server
    server['id'])
  File "/opt/stack/tempest/.tox/tempest/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/opt/stack/tempest/.tox/tempest/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "tempest/common/compute.py", line 229, in create_test_server
    clients.servers_client, server['id'], wait_until)
  File "tempest/common/waiters.py", line 96, in wait_for_server_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (ServersTestJSON:setUpClass) Server 16679c49-ba5b-4c39-a77b-1e3a45c628b5 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: BUILD. Current task state: spawning.

The real failure is in the compute logs:

http://logs.openstack.org/98/610098/2/check/tempest-full/e51c5d8/controller/logs/screen-n-cpu.txt.gz#_Oct_15_17_15_17_661456

Oct 15 17:15:17.661456 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [None req-7cd1f34e-77cd-4e7c-a38f-927d183bcb2f service nova] [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] detaching network adapter failed.: libvirtError: internal error: End of file from qemu monitor
Oct 15 17:15:17.661716 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] Traceback (most recent call last):
Oct 15 17:15:17.661934 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1723, in detach_interface
Oct 15 17:15:17.662180 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] alternative_device_name=self.vif_driver.get_vif_devname(vif))
Oct 15 17:15:17.662401 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 430, in detach_device_with_retry
Oct 15 17:15:17.662623 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] _try_detach_device(conf, persistent, live)
Oct 15 17:15:17.662848 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 419, in _try_detach_device
Oct 15 17:15:17.663110 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] device=alternative_device_name)
Oct 15 17:15:17.663313 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 15 17:15:17.663522 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] self.force_reraise()
Oct 15 17:15:17.663742 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 15 17:15:17.663974 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] six.reraise(self.type_, self.value, self.tb)
Oct 15 17:15:17.664193 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 397, in _try_detach_device
Oct 15 17:15:17.664404 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] self.detach_device(conf, persistent=persistent, live=live)
Oct 15 17:15:17.664604 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 480, in detach_device
Oct 15 17:15:17.664828 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] self._domain.detachDeviceFlags(device_xml, flags=flags)
Oct 15 17:15:17.665050 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 190, in doit
Oct 15 17:15:17.665271 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] result = proxy_call(self._autowrap, f, *args, **kwargs)
Oct 15 17:15:17.665528 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 148, in proxy_call
Oct 15 17:15:17.665735 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] rv = execute(f, *args, **kwargs)
Oct 15 17:15:17.665950 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 129, in execute
Oct 15 17:15:17.666151 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] six.reraise(c, e, tb)
Oct 15 17:15:17.666288 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
Oct 15 17:15:17.666436 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] rv = meth(*args, **kwargs)
Oct 15 17:15:17.666647 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1223, in detachDeviceFlags
Oct 15 17:15:17.666773 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] if ret == -1: raise libvirtError ('virDomainDetachDeviceFlags() failed', dom=self)
Oct 15 17:15:17.666903 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] libvirtError: internal error: End of file from qemu monitor
Oct 15 17:15:17.667071 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: ERROR nova.virt.libvirt.driver [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5]
Oct 15 17:15:17.667193 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: WARNING nova.compute.manager [None req-7cd1f34e-77cd-4e7c-a38f-927d183bcb2f service nova] [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] Detach interface failed, port_id=2bee4fe2-2c28-4932-bc54-d4c6935ae9e7, reason: Failed to detach network adapter device from 16679c49-ba5b-4c39-a77b-1e3a45c628b5: InterfaceDetachFailed: Failed to detach network adapter device from 16679c49-ba5b-4c39-a77b-1e3a45c628b5
Oct 15 17:15:17.667322 ubuntu-xenial-ovh-bhs1-0003042445 nova-compute[14841]: INFO nova.virt.libvirt.driver [-] [instance: 16679c49-ba5b-4c39-a77b-1e3a45c628b5] Instance destroyed successfully.

What happens is the instance delete is racing with a network-vif-deleted event and when we're trying to detach the network interface the guest is gone. This is similar to bug 1536671 but must show up in a different way with newer libvirt (this is with libvirt 4.0.0 and qemu 2.11).

Looking at this change:

https://review.openstack.org/#/c/270891/4/nova/virt/libvirt/driver.py

The one issue there is we're using a stale guest object. We should try to get a new instance of the guest object from the host and if the guest is gone we should get an InstanceNotFound being raised, which is the appropriate thing in this case and would be handled and logged in the compute manager.

Tags: libvirt
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/610727

Changed in nova:
status: Triaged → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

I've corrected the title of the bug since it doesn't cause a failure in tempest as far as I can tell, it's just a race which generates a confusing traceback in the logs which makes it hard to debug real failures.

summary: - Tempest fails with "detaching network adapter failed.: libvirtError:
- internal error: End of file from qemu monitor"
+ Traceback from "detaching network adapter failed.: libvirtError:
+ internal error: End of file from qemu monitor" when network-vif-delete
+ races with server delete
Changed in nova:
assignee: Matt Riedemann (mriedem) → John Garbutt (johngarbutt)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/610727
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6c3e8bc48e460c96f4fc8d975a63ef76e6f55694
Submitter: Zuul
Branch: master

commit 6c3e8bc48e460c96f4fc8d975a63ef76e6f55694
Author: Matt Riedemann <email address hidden>
Date: Mon Oct 15 16:45:14 2018 -0400

    libvirt: don't log error if guest gone during interface detach

    Similar to change I8ae352ff3eeb760c97d1a6fa9d7a59e881d7aea1, if
    we're processing a network-vif-deleted event while an instance
    is being deleted, the asynchronous interface detach could fail
    because the guest is gone from the hypervisor. The existing code
    for handling this case was using a stale guest object so this
    change tries to refresh the guest from the hypervisor and if the
    guest is gone, the Host.get_guest() method should raise an
    InstanceNotFound exception which we just trap, log and return.

    Change-Id: Ic4c870cc5078d3f7ac6b2f96f8904c2a47de418e
    Closes-Bug: #1797966

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.