Steps with test packages on Focal (shutdown-on-init)
---
Start test VM
cat <<-EOF >test-vm.xml
test-vm
hvm
32
1
EOF
virsh define test-vm.xml
virsh start test-vm
$ virsh list
Id Name State
-------------------------
1 test-vm running
Stop libvirtd systemd units
sudo systemctl stop 'libvirtd*'
Scenario 1) Shutdown wins race against XML update (ie, shutdown happens first)
Start libvirtd in GDB
sudo gdb \
-iex 'set confirm off' \
-iex 'set pagination off' \
-ex 'set non-stop on' \
-ex 'handle SIGTERM nostop noprint pass' \
-ex 'add-symbol-file /usr/sbin/libvirtd' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
/usr/sbin/libvirtd
Stop on initialization
(gdb) b qemuStateInitialize
(gdb) run
Thread 17 "libvirtd" hit Breakpoint 1, qemuStateInitialize (privileged=true, callback=0x5558939f10c0 , opaque=0x555893b905d0) at ../../../src/qemu/qemu_driver.c:644
Save the daemon 'opaque' pointer in $ptr (global variable qemu_driver_dmn is not accessible):
(gdb) p qemu_driver_dmn
Cannot access memory at address 0x1e39a8
(gdb) p 'src/qemu/qemu_driver.c'::qemu_driver_dmn
Cannot access memory at address 0x1e39a8
(gdb) t 17
(gdb) set $ptr = opaque
Run until qemuProcessReconnect
(gdb) b qemuProcessReconnect
(gdb) c
Thread 20 "libvirtd" hit Breakpoint 2, qemuProcessReconnect (opaque=0x7fd82c054900) at ../../../src/qemu/qemu_process.c:7922
Run this thread until the lock on qemu_driver_dmn:
(gdb) b virObjectLock thread 20 if anyobj == $ptr
(gdb) t 20
(gdb) c
Thread 20 "libvirtd" hit Breakpoint 3, virObjectLock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427
See the daemon is not yet shutting down
(gdb) t 20
(gdb) p ((virNetDaemonPtr)anyobj)->quit
$1 = false
Stop the shutdown path in the main thread on the lock on qemu_driver_dmn
(gdb) b virObjectLock thread 1 if anyobj == $ptr
$ sudo kill $(pidof libvirtd)
Thread 1 "libvirtd" hit Breakpoint 4, virObjectLock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427
(gdb) t 1
#0 virObjectLock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427
#1 0x00007fd83eabc2d5 in virNetDaemonSignalEvent (watch=watch@entry=2, fd=, events=events@entry=1, opaque=opaque@entry=0x555893b905d0) at ../../../src/rpc/virnetdaemon.c:630
#2 0x00007fd83e97da0d in virEventPollDispatchHandles (fds=0x555893bc21c0, nfds=) at ../../../src/util/vireventpoll.c:503
#3 virEventPollRunOnce () at ../../../src/util/vireventpoll.c:658
#4 0x00007fd83e97c095 in virEventRunDefaultImpl () at ../../../src/util/virevent.c:353
#5 0x00007fd83eabd495 in virNetDaemonRun (dmn=0x555893b905d0) at ../../../src/rpc/virnetdaemon.c:836
#6 0x00005558939ef7d1 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1430
Let it deliver the signal
(gdb) c
Thread 1 "libvirtd" hit Breakpoint 4, virObjectLock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427
(gdb) bt
#0 virObjectLock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427
#1 0x00007fd83eabd2ed in virNetDaemonQuit (dmn=0x555893b905d0) at ../../../src/rpc/virnetdaemon.c:854
#2 0x00007fd83eabc33e in virNetDaemonSignalEvent (watch=watch@entry=2, fd=, events=events@entry=1, opaque=opaque@entry=0x555893b905d0) at ../../../src/rpc/virnetdaemon.c:645
#3 0x00007fd83e97da0d in virEventPollDispatchHandles (fds=0x555893bc21c0, nfds=) at ../../../src/util/vireventpoll.c:503
#4 virEventPollRunOnce () at ../../../src/util/vireventpoll.c:658
#5 0x00007fd83e97c095 in virEventRunDefaultImpl () at ../../../src/util/virevent.c:353
#6 0x00007fd83eabd495 in virNetDaemonRun (dmn=0x555893b905d0) at ../../../src/rpc/virnetdaemon.c:836
#7 0x00005558939ef7d1 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1430
Let it set 'quit'
(gdb) c
Thread 1 "libvirtd" hit Breakpoint 4, virObjectLock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427
(gdb) bt
#0 virObjectLock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427
#1 0x00007fd83eabd4a5 in virNetDaemonRun (dmn=0x555893b905d0) at ../../../src/rpc/virnetdaemon.c:841
#2 0x00005558939ef7d1 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1430
Let it take the lock in the event loop
(gdb) finish
Run till exit from #0 virObjectLock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427
virNetDaemonRun (dmn=0x555893b905d0) at ../../../src/rpc/virnetdaemon.c:843
And run until unlocking, and unlock it
(gdb) b virObjectUnlock thread 1 if anyobj == $ptr
(gdb) c
Thread 1 "libvirtd" hit Breakpoint 5, virObjectUnlock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:504
(gdb) finish
Run till exit from #0 virObjectUnlock (anyobj=0x555893b905d0) at ../../../src/util/virobject.c:504
main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1434
Now, let the qemuProcessReconnect thread continue, it will not update the XML file,
because 'quit' is set (ie, shutdown in progress)
(gdb) t 20
(gdb) p ((virNetDaemonPtr)anyobj)->quit
$2 = true
$ ls -l /run/libvirt/qemu/test-vm.xml
-rw------- 1 root root 10189 Apr 12 19:03 /run/libvirt/qemu/test-vm.xml
(gdb) c &
$ ls -l /run/libvirt/qemu/test-vm.xml
-rw------- 1 root root 10189 Apr 12 19:03 /run/libvirt/qemu/test-vm.xml
This can be confirmed in the log at 'info' level:
$ sudo grep 'Leaving the update of .* domain status XML' /var/log/libvirt/libvirtd-debug.log
2024-04-12 19:22:55.466+0000: 5274: info : qemuProcessReconnect:8157 : Leaving the update of 'test-vm' domain status XML for the next initialization (shutdown detected on this initialization).
Delete breakpoints and let it finish to completion. libvirtd finishes.
(gdb) del br
(gdb) t 1
(gdb) c
[Inferior 1 (process 5194) exited normally]
(gdb) q
The XML file still has the '
Scenario 2) Shutdown loses race against XML update (ie, update happens first)
sudo gdb \
-iex 'set confirm off' \
-iex 'set pagination off' \
-ex 'set non-stop on' \
-ex 'handle SIGTERM nostop noprint pass' \
-ex 'add-symbol-file /usr/sbin/libvirtd' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
/usr/sbin/libvirtd
(gdb) b qemuStateInitialize
(gdb) run
Thread 17 "libvirtd" hit Breakpoint 1, qemuStateInitialize (privileged=true, callback=0x56262420d0c0 , opaque=0x562624b325d0) at ../../../src/qemu/qemu_driver.c:644
Save the 'opaque' pointer (qemu_driver_dmn):
(gdb) t 17
(gdb) set $ptr = opaque
Run until qemuProcessReconnect
(gdb) b qemuProcessReconnect
(gdb) c
Thread 20 "libvirtd" hit Breakpoint 2, qemuProcessReconnect (opaque=0x7fb50c261f60) at ../../../src/qemu/qemu_process.c:7922
Run this thread until the lock on qemu_driver_dmn:
(gdb) b virObjectLock thread 20 if anyobj == $ptr
(gdb) t 20
(gdb) c
Thread 20 "libvirtd" hit Breakpoint 3, virObjectLock (anyobj=0x562624b325d0) at ../../../src/util/virobject.c:427
See the daemon is not yet shutting down
(gdb) t 20
(gdb) p ((virNetDaemonPtr)anyobj)->quit
$1 = false
Stop the main thread on the lock on qemu_driver_dmn, in the event loop
(gdb) b virObjectLock thread 1 if anyobj == $ptr
$ sudo kill $(pidof libvirtd)
Thread 1 "libvirtd" hit Breakpoint 4, virObjectLock (anyobj=0x562624b325d0) at ../../../src/util/virobject.c:427
(gdb) t 1
(gdb) bt
#0 virObjectLock (anyobj=0x562624b325d0) at ../../../src/util/virobject.c:427
#1 0x00007fae5e7a12d5 in virNetDaemonSignalEvent (watch=watch@entry=2, fd=, events=events@entry=1, opaque=opaque@entry=0x562624b325d0) at ../../../src/rpc/virnetdaemon.c:630
#2 0x00007fae5e662a0d in virEventPollDispatchHandles (fds=0x562624b641c0, nfds=) at ../../../src/util/vireventpoll.c:503
#3 virEventPollRunOnce () at ../../../src/util/vireventpoll.c:658
#4 0x00007fae5e661095 in virEventRunDefaultImpl () at ../../../src/util/virevent.c:353
#5 0x00007fae5e7a2495 in virNetDaemonRun (dmn=0x562624b325d0) at ../../../src/rpc/virnetdaemon.c:836
#6 0x000056262420b7d1 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1430
Let it deliver the signal
(gdb) c
Thread 1 "libvirtd" hit Breakpoint 4, virObjectLock (anyobj=0x562624b325d0) at ../../../src/util/virobject.c:427
(gdb) bt
#0 virObjectLock (anyobj=0x562624b325d0) at ../../../src/util/virobject.c:427
#1 0x00007fae5e7a22ed in virNetDaemonQuit (dmn=0x562624b325d0) at ../../../src/rpc/virnetdaemon.c:854
#2 0x00007fae5e7a133e in virNetDaemonSignalEvent (watch=watch@entry=2, fd=, events=events@entry=1, opaque=opaque@entry=0x562624b325d0) at ../../../src/rpc/virnetdaemon.c:645
#3 0x00007fae5e662a0d in virEventPollDispatchHandles (fds=0x562624b641c0, nfds=) at ../../../src/util/vireventpoll.c:503
#4 virEventPollRunOnce () at ../../../src/util/vireventpoll.c:658
#5 0x00007fae5e661095 in virEventRunDefaultImpl () at ../../../src/util/virevent.c:353
#6 0x00007fae5e7a2495 in virNetDaemonRun (dmn=0x562624b325d0) at ../../../src/rpc/virnetdaemon.c:836
#7 0x000056262420b7d1 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1430
Do NOT let it set 'quit' yet
Instead, let the qemuProcessReconnect thread take the lock, and update the XML file, but not unlock yet
(gdb) t 20
(gdb) bt
#0 virObjectLock (anyobj=0x562624b325d0) at ../../../src/util/virobject.c:427
#1 0x00007fae487b922d in qemuProcessReconnect (opaque=) at ../../../src/qemu/qemu_process.c:8155
#2 0x00007fae5e6c054a in virThreadHelper (data=) at ../../../src/util/virthread.c:196
#3 0x00007fae5e381609 in start_thread (arg=) at pthread_create.c:477
#4 0x00007fae5e2a6353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
ubuntu@lp2059272:~$ ls -l /run/libvirt/qemu/test-vm.xml
-rw------- 1 root root 10189 Apr 12 19:03 /run/libvirt/qemu/test-vm.xml
(gdb) b virObjectUnlock thread 20 if anyobj == $ptr
(gdb) c
Thread 20 "libvirtd" hit Breakpoint 5, virObjectUnlock (anyobj=0x562624b325d0) at ../../../src/util/virobject.c:504
ubuntu@lp2059272:~$ ls -l /run/libvirt/qemu/test-vm.xml
-rw------- 1 root root 10189 Apr 12 19:31 /run/libvirt/qemu/test-vm.xml
Let the main thread run again, and see it is blocked waiting on the lock, to set 'quit'
(gdb) t 1
(gdb) c &
(gdb) i th 1
Id Target Id Frame
* 1 Thread 0x7f57fde12b40 (LWP 97120) "libvirtd" (running)
(gdb) interrupt
(gdb) bt
#0 __lll_lock_wait (futex=futex@entry=0x562624b325e0, private=0) at lowlevellock.c:52
#1 0x00007fae5e3840a3 in __GI___pthread_mutex_lock (mutex=0x562624b325e0) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fae5e7a22ed in virNetDaemonQuit (dmn=0x562624b325d0) at ../../../src/rpc/virnetdaemon.c:854
#3 0x00007fae5e7a133e in virNetDaemonSignalEvent (watch=watch@entry=2, fd=, events=events@entry=1, opaque=opaque@entry=0x562624b325d0) at ../../../src/rpc/virnetdaemon.c:645
#4 0x00007fae5e662a0d in virEventPollDispatchHandles (fds=0x562624b641c0, nfds=) at ../../../src/util/vireventpoll.c:503
#5 virEventPollRunOnce () at ../../../src/util/vireventpoll.c:658
#6 0x00007fae5e661095 in virEventRunDefaultImpl () at ../../../src/util/virevent.c:353
#7 0x00007fae5e7a2495 in virNetDaemonRun (dmn=0x562624b325d0) at ../../../src/rpc/virnetdaemon.c:836
#8 0x000056262420b7d1 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1430
(gdb) c &
Let the qemuProcessReconnect finish,
and the main thread is going to unblock and finish too:
(gdb) del br
(gdb) t 20
(gdb) c
...
[Inferior 1 (process 5335) exited normally]
(gdb) q
The XML file still has the '
Scenario 3) Shutdown happens along QEMU monitor calls (ie, calls don't finish)
sudo gdb \
-iex 'set confirm off' \
-iex 'set pagination off' \
-ex 'set non-stop on' \
-ex 'handle SIGTERM nostop noprint pass' \
-ex 'add-symbol-file /usr/sbin/libvirtd' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
/usr/sbin/libvirtd
(gdb) b qemuProcessReconnect
(gdb) run
Thread 20 "libvirtd" hit Breakpoint 1, qemuProcessReconnect (opaque=0x7f23b0055d30) at ../../../src/qemu/qemu_process.c:7922
Run this thread until a QEMU monitor send call:
(gdb) t 20
(gdb) b qemuMonitorSend thread 20
(gdb) c
Thread 20 "libvirtd" hit Breakpoint 2, qemuMonitorSend (mon=0x7f23980023c0, msg=0x7f238e35f7b0) at ../../../src/qemu/qemu_monitor.c:979
Stop the main thread on the QEMU driver cleanup, after the event loop is gone:
(gdb) b qemuStateCleanup
ubuntu@lp2059272:~$ sudo kill $(pidof libvirtd)
Thread 1 "libvirtd" hit Breakpoint 3, qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1127
(gdb) t 1
(gdb) bt
#0 qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1127
#1 0x00007f23c4a8c47f in virStateCleanup () at ../../../src/libvirt.c:669
#2 0x000055ccdfadebc8 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1447
Let it finish
(gdb) finish
Run till exit from #0 qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1127
0x00007f23c4a8c47f in virStateCleanup () at ../../../src/libvirt.c:669
Let the qemuProcessReconnect thread continue, and see it is blocked waiting on reply/recv from event loop
(gdb) t 20
(gdb) c &
(gdb) i th 20
Id Target Id Frame
* 20 Thread 0x7f9a157fa700 (LWP 97193) "libvirtd" (running)
(gdb) interrupt
Thread 20 "libvirtd" stopped.
(gdb) bt
#0 futex_wait_cancelable (private=, expected=0, futex_word=0x7f2398002420) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f23980023d0, cond=0x7f23980023f8) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7f23980023f8, mutex=0x7f23980023d0) at pthread_cond_wait.c:647
#3 0x00007f23c48ee79b in virCondWait (c=, m=) at ../../../src/util/virthread.c:144
#4 0x00007f239e994684 in qemuMonitorSend (mon=0x7f23980023c0, msg=) at ../../../src/qemu/qemu_monitor.c:998
#5 0x00007f239e9a3dc8 in qemuMonitorJSONCommandWithFd (mon=0x7f23980023c0, cmd=0x7f23980027b0, scm_fd=-1, reply=0x7f238e35f840) at ../../../src/qemu/qemu_monitor_json.c:328
#6 0x00007f239e9a5eb5 in qemuMonitorJSONCommand (reply=0x7f238e35f840, cmd=0x7f23980027b0, mon=) at ../../../src/qemu/qemu_monitor_json.c:1602
#7 qemuMonitorJSONSetCapabilities (mon=) at ../../../src/qemu/qemu_monitor_json.c:1602
#8 0x00007f239e973b4c in qemuProcessInitMonitor (asyncJob=QEMU_ASYNC_JOB_NONE, vm=0x7f23b004f9b0, driver=0x7f23b000f1e0) at ../../../src/qemu/qemu_process.c:1932
#9 qemuConnectMonitor (driver=driver@entry=0x7f23b000f1e0, vm=0x7f23b004f9b0, asyncJob=asyncJob@entry=0, retry=retry@entry=false, logCtxt=logCtxt@entry=0x0) at ../../../src/qemu/qemu_process.c:1992
#10 0x00007f239e97fbca in qemuProcessReconnect (opaque=) at ../../../src/qemu/qemu_process.c:7978
#11 0x00007f23c48ee54a in virThreadHelper (data=) at ../../../src/util/virthread.c:196
#12 0x00007f23c45af609 in start_thread (arg=) at pthread_create.c:477
#13 0x00007f23c44d4353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Let it run again; it does not unblock, even if the main thread finishes:
(gdb) c &
(gdb) i th 1 20
Id Target Id Frame
1 Thread 0x7f23c0a95b40 (LWP 5512) "libvirtd" 0x00007f23c4a8c47f in virStateCleanup () at ../../../src/libvirt.c:669
* 20 Thread 0x7f238e360700 (LWP 5590) "libvirtd" (running)
(gdb) t 1
(gdb) c
Continuing.
[Thread 0x7f238e360700 (LWP 5590) exited]
Thread-specific breakpoint 2 deleted - thread 20 no longer in the thread list.
...
[Inferior 1 (process 5512) exited normally]
(gdb) q
The XML was not updated, as expected:
$ ls -l /run/libvirt/qemu/test-vm.xml
-rw------- 1 root root 10189 Apr 12 19:31 /run/libvirt/qemu/test-vm.xml
$ sudo grep -e '
Now, the next time libvirtd starts, it correctly parses that XML:
$ sudo systemctl start libvirtd.service
$ journalctl -b -u libvirtd.service | grep -A1 error
$
And libvirt is aware of the domain, and can manage it:
$ virsh list
Id Name State
-------------------------
1 test-vm running
$ virsh destroy test-vm
Domain test-vm destroyed
$ virsh undefine test-vm
Domain test-vm has been undefined