Steps to reproduce on Jammy
---
Stop libvirt systemd units
sudo systemctl stop 'libvirtd*'
Start libvirt in GDB
sudo gdb \
-iex 'set confirm off' \
-iex 'set pagination off' \
-iex 'set debuginfod enabled on' \
-iex 'set debuginfod urls https://debuginfod.ubuntu.com' \
-ex 'set non-stop on' \
-ex 'handle SIGTERM nostop noprint pass' \
-ex 'add-symbol-file /usr/sbin/libvirtd' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
/usr/sbin/libvirtd
Add breakpoints for qemu driver cleanup and device deleted event
b qemuStateCleanup
b processDeviceDeletedEvent
run
Start test VM with an USB mouse device
cat <<-EOF >test-vm.xml
test-vmhvm321
EOF
virsh define test-vm.xml
virsh start test-vm
$ virsh list
Id Name State
-------------------------
1 test-vm running
Delete the USB mouse device
DEVICE_ID=$(virsh qemu-monitor-command test-vm --hmp 'info qtree' | grep 'dev: usb-mouse' | cut -d'"' -f2)
virsh qemu-monitor-command test-vm --hmp "device_del $DEVICE_ID"
Back to GDB
Thread 25 "qemu-event" hit Breakpoint 2, 0x00007f6179ed20a7 in processDeviceDeletedEvent (devAlias=, vm=0x7f61842f1020, driver=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:3536
Add breakpoint to domain status XML save, and continue the thread above
b virDomainObjSave
t 25
c
Thread 25 "qemu-event" hit Breakpoint 3, virDomainObjSave (obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879
Check the backtrace of the domain status XML save function, coming from device deleted event
(gdb) bt
#0 virDomainObjSave (obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879
#1 0x00007f6179eb68c3 in qemuDomainObjSaveStatus (driver=0x7f6184035e20, obj=0x7f61842f1020) at ../../src/qemu/qemu_domain.c:5801
#2 0x00007f6179ed2159 in processDeviceDeletedEvent (devAlias=0x7f617c0073e0 "input0", vm=0x7f61842f1020, driver=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:3557
#3 qemuProcessEventHandler (data=0x7f617c0072b0, opaque=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:4184
#4 0x00007f61974fc983 in virThreadPoolWorker (opaque=) at ../../src/util/virthreadpool.c:164
#5 0x00007f61974fb4d9 in virThreadHelper (data=) at ../../src/util/virthread.c:241
#6 0x00007f6196e64ac3 in start_thread (arg=) at ./nptl/pthread_create.c:442
#7 0x00007f6196ef6850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Leave the thread at this point
Let's trigger the shutdown path
First, increase the shutdown timer (30 seconds is too fast for me; use 30 minutes)
(gdb) b virEventAddTimeout
$ sudo kill $(pidof libvirtd)
Thread 1 "libvirtd" hit Breakpoint 4, virEventAddTimeout (timeout=30000, cb=0x7f61975bbbc0 , opaque=0x55aec684a020, ff=0x0) at ../../src/util/virevent.c:148
t 1
set $rdi = 30 * 60 * 1000
(gdb) i r $rdi
rdi 0x1b7740 1800000
Now, skip the qemu driver shutdown wait path, to force the scenario (unexpected) that it allows a race condition:
b qemuStateShutdownWait
c
Thread 26 "daemon-shutdown" hit Breakpoint 5, qemuStateShutdownWait () at ../../src/qemu/qemu_driver.c:1055
t 26
set $rax = 0
(gdb) i r $rax
rax 0x0 0
ret
c
Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at ../../src/qemu/qemu_driver.c:1070
Check there are 2 threads: cleanup and domain status XML save
(gdb) i th
Id Target Id Frame
1 Thread 0x7f6193934ac0 (LWP 2544) "libvirtd" qemuStateCleanup () at ../../src/qemu/qemu_driver.c:1070
18 Thread 0x7f616a7fc640 (LWP 2563) "gmain" (running)
19 Thread 0x7f6169ffb640 (LWP 2564) "gdbus" (running)
20 Thread 0x7f61697fa640 (LWP 2565) "udev-event" (running)
24 Thread 0x7f616affd640 (LWP 2641) "vm-test-vm" (running)
25 Thread 0x7f61687f8640 (LWP 2660) "qemu-event" virDomainObjSave (obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879
Confirm the qemu driver's domain xml formatter/options is set/referenced:
t 25
(gdb) p xmlopt.privateData.format
$1 = (virDomainXMLPrivateDataFormatFunc) 0x7f6179eb1da0
(gdb) p xmlopt.parent.parent_instance
$2 = {g_type_instance = {g_class = 0x7f6184053290}, ref_count = 1, qdata = 0x0}
Let the cleanup function and shutdown path finish
t 1
c &
Check the formatter/options again; it is *NO* longer referenced:
t 25
(gdb) p xmlopt.privateData.format
$3 = (virDomainXMLPrivateDataFormatFunc) 0x7f6179eb1da0
(gdb) p xmlopt.parent.parent_instance
$4 = {g_type_instance = {g_class = 0x0}, ref_count = 0, qdata = 0x0}
The object data is _not_ zeroed in the last unreference anymore
in Jammy as it is Focal, but it might happen, as this is really
an use-after-free (and another thread might get/use that memory).
So, let's simulate that.
set xmlopt.privateData.format = 0
(gdb) p xmlopt.privateData.format
$5 = (virDomainXMLPrivateDataFormatFunc) 0x0
Check the VM status XML *before* the save function finishes:
$ sudo grep -e '
Let the save function continue, and libvirt finishes shutting down:
(gdb) c
Continuing.
...
[Inferior 1 (process 2544) exited normally]
Check the VM status XML *after*:
$ sudo grep -e '
It no longer has the 'monitor path' tag/field.
Now, the next time libvirtd starts, it fails to parse that XML:
$ sudo systemctl start libvirtd.service
$ journalctl -b -u libvirtd.service | tail
...
... libvirtd[2789]: internal error: no monitor path
... libvirtd[2789]: Failed to load config for domain 'test-vm'
And libvirt is not aware of the domain, and cannot manage it:
$ virsh list
Id Name State
--------------------
$ virsh list --all
Id Name State
--------------------------
- test-vm shut off
Even though it is still running:
$ pgrep -af qemu-system-x86_64 | cut -d, -f1
2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,
Stop it manually:
$ sudo kill $(sudo cat /run/libvirt/qemu/test-vm.pid)
$ sudo rm /run/libvirt/qemu/test-vm.{xml,pid}