Steps with test packages on Focal (shutdown-on-init) --- Environment: --- On top of LXD VM in comments #12/#13. Enable PPA & debug symbols sudo add-apt-repository -yn ppa:mfo/lp2059272 sudo sed '/^deb / s,$, main/debug,' -i /etc/apt/sources.list.d/mfo-ubuntu-lp2059272-focal.list sudo apt update Install packages sudo apt install --yes libvirt{0,-daemon{,-driver-qemu}}{,-dbgsym} libvirt-clients gdb qemu-system-x86 $ dpkg -s libvirt-daemon | grep ^Version: Version: 6.0.0-0ubuntu8.18~ppa1 Libvirtd debug logging cat <) at ../../../src/qemu/qemu_process.c:8123 #2 0x00007fe64aebd54a in virThreadHelper (data=) at ../../../src/util/virthread.c:196 #3 0x00007fe64ab7e609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007fe64aaa3353 in clone () from /lib/x86_64-linux-gnu/libc.so.6 $ sudo kill $(pidof libvirtd) Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1180 t 20 (gdb) p xmlopt.privateData.format $1 = (virDomainXMLPrivateDataFormatFunc) 0x7fe644152890 Let the cleanup function finish t 1 finish Notice it took a while (30 seconds). (gdb) t 20 (gdb) p xmlopt.privateData.format $3 = (virDomainXMLPrivateDataFormatFunc) 0x0 Let the save function continue, and libvirt finish shutdown: (gdb) c & (gdb) t 1 (gdb) c (gdb) q Check the VM status XML *after*: ubuntu@lp2059272-focal:~$ sudo grep -e ' And everything happened as in the reproducer. i.e., the SAME behavior happened BY DEFAULT. Just with a 30 seconds delay. Checking the libvirtd debug logs to confirm the patch behavior: $ sudo tail -n50 /var/log/libvirt/libvirtd-debug.log | sed -n '/qemuStateCleanupWait/,$p' 2024-03-30 22:49:24.737+0000: 6875: debug : qemuStateCleanupWait:1144 : timeout 30, timeout_env '(null)' 2024-03-30 22:49:24.737+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 0 2024-03-30 22:49:24.737+0000: 6875: warning : qemuStateCleanupWait:1153 : Waiting for qemuProcessReconnect() threads (1) to end. Configure with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT (-1 = wait; 0 = do not wait; N = wait up to N seconds; current = 30) 2024-03-30 22:49:25.740+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 1 2024-03-30 22:49:26.740+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 2 2024-03-30 22:49:27.740+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 3 2024-03-30 22:49:28.741+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 4 2024-03-30 22:49:29.741+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 5 2024-03-30 22:49:30.741+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 6 2024-03-30 22:49:31.742+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 7 2024-03-30 22:49:32.742+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 8 2024-03-30 22:49:33.742+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 9 2024-03-30 22:49:34.742+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 10 2024-03-30 22:49:35.743+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 11 2024-03-30 22:49:36.743+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 12 2024-03-30 22:49:37.744+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 13 2024-03-30 22:49:38.744+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 14 2024-03-30 22:49:39.744+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 15 2024-03-30 22:49:40.744+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 16 2024-03-30 22:49:41.745+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 17 2024-03-30 22:49:42.745+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 18 2024-03-30 22:49:43.746+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 19 2024-03-30 22:49:44.746+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 20 2024-03-30 22:49:45.747+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 21 2024-03-30 22:49:46.747+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 22 2024-03-30 22:49:47.748+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 23 2024-03-30 22:49:48.748+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 24 2024-03-30 22:49:49.749+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 25 2024-03-30 22:49:50.749+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 26 2024-03-30 22:49:51.750+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 27 2024-03-30 22:49:52.750+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 28 2024-03-30 22:49:53.750+0000: 6875: debug : qemuStateCleanupWait:1150 : threads 1, seconds 29 2024-03-30 22:49:54.751+0000: 6875: warning : qemuStateCleanupWait:1164 : Leaving qemuProcessReconnect() threads (1) per timeout (30) 2024-03-30 22:51:00.315+0000: 6906: debug : qemuDomainObjEndJob:9746 : Stopping job: modify (async=none vm=0x7fe638012540 name=test-vm) 2024-03-30 22:51:00.315+0000: 6906: debug : qemuProcessReconnect:8161 : Not decrementing qemuProcessReconnect() threads as the QEMU driver is already deallocated/freed. This would be shown in libvirtd syslog/journalctl (warnings/errors): $ sudo tail -n50 /var/log/libvirt/libvirtd-debug.log | sed -n '/qemuStateCleanupWait/,$p' | grep -e warning -e error 2024-03-30 22:49:24.737+0000: 6875: warning : qemuStateCleanupWait:1153 : Waiting for qemuProcessReconnect() threads (1) to end. Configure with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT (-1 = wait; 0 = do not wait; N = wait up to N seconds; current = 30) 2024-03-30 22:49:54.751+0000: 6875: warning : qemuStateCleanupWait:1164 : Leaving qemuProcessReconnect() threads (1) per timeout (30) Stop the VM, and restart it with libvirt. sudo kill $(sudo cat /run/libvirt/qemu/test-vm.pid) && sudo rm /run/libvirt/qemu/test-vm.{pid,xml} sudo systemctl start libvirtd.service && virsh start test-vm && sudo systemctl stop 'libvirtd*' Scenario with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT=5 --- The same result happens with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT=5 (ie wait at most 5 seconds) Repeat, with `gdb -ex 'set environment LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT 5'`: The steps 't 1; finish' take 5 seconds, instead of 30 seconds. ubuntu@lp2059272-focal:~$ sudo grep -e ' ubuntu@lp2059272-focal:~$ sudo tail -n50 /var/log/libvirt/libvirtd-debug.log | sed -n '/qemuStateCleanupWait/,$p' 2024-03-30 23:00:11.016+0000: 7017: debug : qemuStateCleanupWait:1144 : timeout 5, timeout_env '5' 2024-03-30 23:00:11.016+0000: 7017: debug : qemuStateCleanupWait:1150 : threads 1, seconds 0 2024-03-30 23:00:11.016+0000: 7017: warning : qemuStateCleanupWait:1153 : Waiting for qemuProcessReconnect() threads (1) to end. Configure with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT (-1 = wait; 0 = do not wait; N = wait up to N seconds; current = 5) 2024-03-30 23:00:12.017+0000: 7017: debug : qemuStateCleanupWait:1150 : threads 1, seconds 1 2024-03-30 23:00:13.018+0000: 7017: debug : qemuStateCleanupWait:1150 : threads 1, seconds 2 2024-03-30 23:00:14.018+0000: 7017: debug : qemuStateCleanupWait:1150 : threads 1, seconds 3 2024-03-30 23:00:15.018+0000: 7017: debug : qemuStateCleanupWait:1150 : threads 1, seconds 4 2024-03-30 23:00:16.018+0000: 7017: warning : qemuStateCleanupWait:1164 : Leaving qemuProcessReconnect() threads (1) per timeout (5) 2024-03-30 23:00:45.694+0000: 7048: debug : qemuDomainObjEndJob:9746 : Stopping job: modify (async=none vm=0x7f40d0052de0 name=test-vm) 2024-03-30 23:00:45.694+0000: 7048: debug : qemuProcessReconnect:8161 : Not decrementing qemuProcessReconnect() threads as the QEMU driver is already deallocated/freed. Scenario with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT=0 --- The same result happens with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT=0 (ie do not wait) Repeat, with `gdb -ex 'set environment LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT 0'`: The steps 't 1; finish' take 0 seconds (no wait), instead of 30 or 5 seconds. ubuntu@lp2059272-focal:~$ sudo grep -e ' ubuntu@lp2059272-focal:~$ sudo tail -n50 /var/log/libvirt/libvirtd-debug.log | sed -n '/qemuStateCleanupWait/,$p' 2024-03-30 23:03:11.487+0000: 7124: debug : qemuStateCleanupWait:1144 : timeout 0, timeout_env '0' 2024-03-30 23:03:11.488+0000: 7124: warning : qemuStateCleanupWait:1164 : Leaving qemuProcessReconnect() threads (1) per timeout (0) 2024-03-30 23:03:15.313+0000: 7155: debug : qemuDomainObjEndJob:9746 : Stopping job: modify (async=none vm=0x7ff620052ad0 name=test-vm) 2024-03-30 23:03:15.313+0000: 7155: debug : qemuProcessReconnect:8161 : Not decrementing qemuProcessReconnect() threads as the QEMU driver is already deallocated/freed. Scenario with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT=-1 --- A different result happens with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT=-1 (ie wait forever) Repeat, with `gdb -ex 'set environment LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT -1'`: The steps 't 1; finish' does not finish, it keeps running, waiting for the pending thread. t 1 finish ... wait, wait, wait ... ctrl-c (gdb) bt #0 0x00007fb29ceed23f in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb29cef2ec7 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fb29d0bf557 in g_usleep () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #3 0x00007fb2906498f5 in qemuStateCleanupWait () at ../../../src/qemu/qemu_driver.c:1159 #4 qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1184 #5 0x00007fb29d4e746f in virStateCleanup () at ../../../src/libvirt.c:669 #6 0x00005569adc89bc8 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1447 Check the formatter/options again; it is *STILL* referenced, not 0x0 anymore: t 20 (gdb) p xmlopt.privateData.format $1 = (virDomainXMLPrivateDataFormatFunc) 0x7fb2905d8890 Thread 1 is still in qemuStateCleanupWait(), so let it run again, (gdb) c & And unblock the other thread. Now libvirt finishes shutting down. (gdb) t 20 (gdb) c ... [Inferior 1 (process 7233) exited normally] The logs show that thread has actually finished before libvirt exited. ubuntu@lp2059272-focal:~$ sudo tail -n200 /var/log/libvirt/libvirtd-debug.log | sed -n '/qemuStateCleanupWait/,$p' 2024-03-30 23:06:00.512+0000: 7233: debug : qemuStateCleanupWait:1144 : timeout -1, timeout_env '-1' 2024-03-30 23:06:00.512+0000: 7233: debug : qemuStateCleanupWait:1150 : threads 1, seconds 0 2024-03-30 23:06:00.512+0000: 7233: warning : qemuStateCleanupWait:1153 : Waiting for qemuProcessReconnect() threads (1) to end . Configure with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT (-1 = wait; 0 = do not wait; N = wait up to N seconds; current = -1) 2024-03-30 23:06:01.513+0000: 7233: debug : qemuStateCleanupWait:1150 : threads 1, seconds 1 2024-03-30 23:06:02.513+0000: 7233: debug : qemuStateCleanupWait:1150 : threads 1, seconds 2 2024-03-30 23:06:03.514+0000: 7233: debug : qemuStateCleanupWait:1150 : threads 1, seconds 3 ... 2024-03-30 23:09:43.994+0000: 7233: debug : qemuStateCleanupWait:1150 : threads 1, seconds 130 2024-03-30 23:09:44.994+0000: 7233: debug : qemuStateCleanupWait:1150 : threads 1, seconds 131 2024-03-30 23:09:45.994+0000: 7233: debug : qemuStateCleanupWait:1150 : threads 1, seconds 132 2024-03-30 23:09:46.075+0000: 7264: debug : qemuDomainObjEndJob:9746 : Stopping job: modify (async=none vm=0x7fb28c04c1c0 name=test-vm) 2024-03-30 23:09:46.075+0000: 7264: debug : qemuProcessReconnect:8158 : Decrementing qemuProcessReconnect() threads. 2024-03-30 23:09:46.995+0000: 7233: debug : qemuStateCleanupWait:1170 : All qemuProcessReconnect() threads finished And the `monitor path` is still in the XML: ubuntu@lp2059272-focal:~$ sudo grep -e ' Of course, the above also happens by default if the thread finishes within the default timeout (30 seconds). Scenario: (default/real-world) no env var, and the thread finishes quickly --- (Running the steps real quick.) Thread 20 "libvirtd" hit Breakpoint 2, virDomainObjSave (obj=0x55c688ebbe80, xmlopt=0x55c688eb3f40, statusDir=0x55c688e78f60 "/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157 $ sudo kill $(pidof libvirtd) Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1181 (gdb) t 1 (gdb) c & (gdb) t 20 (gdb) c ... [Inferior 1 (process 32761) exited normally] ubuntu@lp2059272-focal:~$ sudo tail -n50 /var/log/libvirt/libvirtd-debug.log | sed -n '/qemuStateCleanupWait/,$p' 2024-03-30 23:12:10.242+0000: 7281: debug : qemuStateCleanupWait:1144 : timeout 30, timeout_env '(null)' 2024-03-30 23:12:10.242+0000: 7281: debug : qemuStateCleanupWait:1150 : threads 1, seconds 0 2024-03-30 23:12:10.242+0000: 7281: warning : qemuStateCleanupWait:1153 : Waiting for qemuProcessReconnect() threads (1) to end. Configure with LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT (-1 = wait; 0 = do not wait; N = wait up to N seconds; current = 30) 2024-03-30 23:12:11.242+0000: 7281: debug : qemuStateCleanupWait:1150 : threads 1, seconds 1 2024-03-30 23:12:11.484+0000: 7312: debug : qemuDomainObjEndJob:9746 : Stopping job: modify (async=none vm=0x7f7b4c04c3a0 name=test-vm) 2024-03-30 23:12:11.484+0000: 7312: debug : qemuProcessReconnect:8158 : Decrementing qemuProcessReconnect() threads. 2024-03-30 23:12:12.243+0000: 7281: debug : qemuStateCleanupWait:1170 : All qemuProcessReconnect() threads finished ubuntu@lp2059272-focal:~$ sudo grep -e ' Now, the next time libvirtd starts, it correctly parses that XML: $ sudo systemctl start libvirtd.service ubuntu@lp2059272-focal:~$ journalctl -b -u libvirtd.service | grep error ... Mar 30 23:14:27 lp2059272-focal libvirtd[7325]: 7341: error : dnsmasqCapsRefreshInternal:714 : Cannot check dnsmasq binary /usr/sbin/dnsmasq: No such file or directory And libvirt is now aware of the domain, and can manage it: $ virsh list Id Name State ------------------------- 1 test-vm running $ virsh destroy test-vm Domain test-vm destroyed