Comment 1 for bug 1813192

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

From a previous debugging session with DanPB on this dreaded "cannot
acquire state change lock" error, it occurs under various situations:

  - The QEMU process has hung

  - There _might_ be a bug in libvirt's lock handling: libvirt might run
    a QEMU monitor command, but might forget to release the 'state
    change lock' once the monitor command finished. So libvirt seems to
    be failing to clean up its locks correctly.

In this case, the lock is held by "remoteDispatchDomainBlockJobAbort",
which points to a block device job not being cleaned up correctly. But
I still need the libvirt block subsystem developers eyes; will try to
get hold of them.

Further, based on an upstream 'libvirt-users' mailing list thread[1] on
the same error (where the lock is held by
"remoteDispatchDomainBlockJobAbort"), one of the libvirt developers
writes:

    "This looks like some API forgot to unset the job before returning.
    In that case, restarting libvirtd is the only option. If this
    happened on the latest release please do file a bug. Otherwise try
    with the latest release."

[1] https://www.redhat.com/archives/libvirt-users/2017-October/msg00042.html

Posting the QEMU log of the relevant guest ('instance-0000000f')
here (in case the CI logs get deleted):
-----------------------------------------------------------------------
2019-01-23 10:54:21.554+0000: starting up libvirt version: 4.0.0, package: 1ubuntu8.6 (Christian Ehrhardt <email address hidden> Fri, 09 Nov 2018 07:42:01 +0100), qemu version: 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.9), hostname: ubuntu-bionic-limestone-regionone-0002045447
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name guest=instance-0000000f,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-19-instance-0000000f/master-key.aes -machine pc-i440fx-bionic,accel=tcg,usb=off,dump-guest-core=off -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c2b71d99-ed28-4074-b5b4-b4dff57221b8 -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=18.1.0,serial=0aed5fc2-01ee-41cf-9aa9-81f5fe7a681a,uuid=c2b71d99-ed28-4074-b5b4-b4dff57221b8,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-19-instance-0000000f/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/stack/data/nova/instances/c2b71d99-ed28-4074-b5b4-b4dff57221b8/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/opt/stack/data/nova/instances/c2b71d99-ed28-4074-b5b4-b4dff57221b8/disk.config,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=none -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -add-fd set=0,fd=29 -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
2019-01-23T10:54:21.633423Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on: char device redirected to /dev/pts/0 (label charserial0)
2019-01-23T10:54:21.641511Z qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
2019-01-23T10:54:24.695402Z qemu-system-x86_64: terminating on signal 15 from pid 22451 (/usr/sbin/libvirtd)
-----------------------------------------------------------------------

PS: Unrelated "yikes" — the libvirt debug log was 250G when I did a
    `wget` on it. It crashed my browser. I always get misled by the
    4.1M size shown for the compressed 'libvirtd_log.txt.gz' when you
    see it listed under controller/logs/libvirt/. I wonder how we can
    reduce the size of it here; as Gate is using the libvirt debug log
    filters.