libvirt: instance delete fails with "Cannot destroy instance, operation time out: libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort)" in bionic nodes (libvirt 4.0.0, qemu 2.11)

Bug #1813192 reported by Matt Riedemann
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Unassigned

Bug Description

Seen here:

http://logs.openstack.org/71/605871/10/gate/tempest-full-py3/a35b351/controller/logs/screen-n-cpu.txt.gz#_Jan_23_10_54_54_897840

Jan 23 10:54:54.897840 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: WARNING nova.virt.libvirt.driver [None req-b9596061-eba4-4729-a239-da35f738fd1c tempest-ImagesTestJSON-884514011 tempest-ImagesTestJSON-884514011] [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] Cannot destroy instance, operation time out: libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [None req-b9596061-eba4-4729-a239-da35f738fd1c tempest-ImagesTestJSON-884514011 tempest-ImagesTestJSON-884514011] [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] Setting instance vm_state to ERROR: nova.exception.InstancePowerOffFailure: Failed to power off instance: operation time out
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] Traceback (most recent call last):
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 890, in _destroy
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] guest.poweroff()
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 148, in poweroff
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] self._domain.destroy()
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] result = proxy_call(self._autowrap, f, *args, **kwargs)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in proxy_call
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] rv = execute(f, *args, **kwargs)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in execute
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] six.reraise(c, e, tb)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] raise value
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 83, in tworker
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] rv = meth(*args, **kwargs)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/usr/local/lib/python3.6/dist-packages/libvirt.py", line 1142, in destroy
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] if ret == -1: raise libvirtError ('virDomainDestroy() failed', dom=self)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8]
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] During handling of the above exception, another exception occurred:
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8]
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] Traceback (most recent call last):
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/opt/stack/nova/nova/compute/manager.py", line 2673, in do_terminate_instance
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] self._delete_instance(context, instance, bdms)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/opt/stack/nova/nova/hooks.py", line 154, in inner
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] rv = f(*args, **kwargs)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/opt/stack/nova/nova/compute/manager.py", line 2610, in _delete_instance
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] self._shutdown_instance(context, instance, bdms)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/opt/stack/nova/nova/compute/manager.py", line 2495, in _shutdown_instance
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] pass
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] self.force_reraise()
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] six.reraise(self.type_, self.value, self.tb)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] raise value
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/opt/stack/nova/nova/compute/manager.py", line 2489, in _shutdown_instance
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] block_device_info)
Jan 23 10:54:54.919731 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1000, in destroy
Jan 23 10:54:54.925777 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] self._destroy(instance)
Jan 23 10:54:54.925777 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 920, in _destroy
Jan 23 10:54:54.925777 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] raise exception.InstancePowerOffFailure(reason=reason)
Jan 23 10:54:54.925777 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8] nova.exception.InstancePowerOffFailure: Failed to power off instance: operation time out
Jan 23 10:54:54.925777 ubuntu-bionic-limestone-regionone-0002045447 nova-compute[19182]: ERROR nova.compute.manager [instance: c2b71d99-ed28-4074-b5b4-b4dff57221b8]

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Cannot%20destroy%20instance%2C%20operation%20time%20out%3A%20libvirt.libvirtError%3A%20Timed%20out%20during%20operation%3A%20cannot%20acquire%20state%20change%20lock%20(held%20by%20remoteDispatchDomainBlockJobAbort)%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

3 hits in 7 days, check and gate, all failures, and only on ubuntu-bionic nodes.

From the libvirtd logs, I see a lot of this for the guest that fails to delete:

2019-01-23 10:54:25.336+0000: 22451: debug : qemuMonitorIO:767 : Error on monitor Unable to read from monitor: Connection reset by peer
2019-01-23 10:54:25.336+0000: 22451: debug : qemuMonitorIO:788 : Triggering EOF callback
2019-01-23 10:54:25.336+0000: 22451: debug : qemuProcessHandleMonitorEOF:289 : Received EOF on 0x7f25c4039340 'instance-0000000f'
2019-01-23 10:54:25.336+0000: 22451: debug : qemuProcessHandleMonitorEOF:293 : Domain is being destroyed, EOF is expected
2019-01-23 10:54:25.336+0000: 22451: debug : qemuMonitorIO:767 : Error on monitor Unable to read from monitor: Connection reset by peer
2019-01-23 10:54:25.336+0000: 22451: debug : qemuMonitorIO:788 : Triggering EOF callback
2019-01-23 10:54:25.336+0000: 22451: debug : qemuProcessHandleMonitorEOF:289 : Received EOF on 0x7f25c4039340 'instance-0000000f'
2019-01-23 10:54:25.336+0000: 22451: debug : qemuProcessHandleMonitorEOF:293 : Domain is being destroyed, EOF is expected

It seems that libvirt gets stuck in some kind of loop.

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :
Download full text (4.0 KiB)

From a previous debugging session with DanPB on this dreaded "cannot
acquire state change lock" error, it occurs under various situations:

  - The QEMU process has hung

  - There _might_ be a bug in libvirt's lock handling: libvirt might run
    a QEMU monitor command, but might forget to release the 'state
    change lock' once the monitor command finished. So libvirt seems to
    be failing to clean up its locks correctly.

In this case, the lock is held by "remoteDispatchDomainBlockJobAbort",
which points to a block device job not being cleaned up correctly. But
I still need the libvirt block subsystem developers eyes; will try to
get hold of them.

Further, based on an upstream 'libvirt-users' mailing list thread[1] on
the same error (where the lock is held by
"remoteDispatchDomainBlockJobAbort"), one of the libvirt developers
writes:

    "This looks like some API forgot to unset the job before returning.
    In that case, restarting libvirtd is the only option. If this
    happened on the latest release please do file a bug. Otherwise try
    with the latest release."

[1] https://www.redhat.com/archives/libvirt-users/2017-October/msg00042.html

Posting the QEMU log of the relevant guest ('instance-0000000f')
here (in case the CI logs get deleted):
-----------------------------------------------------------------------
2019-01-23 10:54:21.554+0000: starting up libvirt version: 4.0.0, package: 1ubuntu8.6 (Christian Ehrhardt <email address hidden> Fri, 09 Nov 2018 07:42:01 +0100), qemu version: 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.9), hostname: ubuntu-bionic-limestone-regionone-0002045447
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name guest=instance-0000000f,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-19-instance-0000000f/master-key.aes -machine pc-i440fx-bionic,accel=tcg,usb=off,dump-guest-core=off -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c2b71d99-ed28-4074-b5b4-b4dff57221b8 -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=18.1.0,serial=0aed5fc2-01ee-41cf-9aa9-81f5fe7a681a,uuid=c2b71d99-ed28-4074-b5b4-b4dff57221b8,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-19-instance-0000000f/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/stack/data/nova/instances/c2b71d99-ed28-4074-b5b4-b4dff57221b8/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/opt/stack/data/nova/instances/c2b71d99-ed28-4074-b5b4-b4dff57221b8/disk.config,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=none -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -add-fd set=0,fd=29 -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:0 -device cirrus-vga,id=vi...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.