Comment 1 for bug 2059272

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote (last edit ):

Steps to reproduce on Jammy
---

Stop libvirt systemd units

 sudo systemctl stop 'libvirtd*'

Start libvirt in GDB

 sudo gdb \
   -iex 'set confirm off' \
   -iex 'set pagination off' \
   -iex 'set debuginfod enabled on' \
   -iex 'set debuginfod urls https://debuginfod.ubuntu.com' \
   -ex 'set non-stop on' \
   -ex 'handle SIGTERM nostop noprint pass' \
   -ex 'add-symbol-file /usr/sbin/libvirtd' \
   -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
   -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
   -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
   /usr/sbin/libvirtd

Add breakpoints for qemu driver cleanup and device deleted event

 b qemuStateCleanup
 b processDeviceDeletedEvent
 run

Start test VM with an USB mouse device

 cat <<-EOF >test-vm.xml
 <domain type='qemu'>
   <name>test-vm</name>
   <os>
     <type>hvm</type>
   </os>
   <memory unit='MiB'>32</memory>
   <vcpu>1</vcpu>
   <devices>
     <input type='mouse' bus='usb'/>
   </devices>
 </domain>
 EOF

 virsh define test-vm.xml
 virsh start test-vm

 $ virsh list
  Id Name State
 -------------------------
  1 test-vm running

Delete the USB mouse device

 DEVICE_ID=$(virsh qemu-monitor-command test-vm --hmp 'info qtree' | grep 'dev: usb-mouse' | cut -d'"' -f2)
 virsh qemu-monitor-command test-vm --hmp "device_del $DEVICE_ID"

Back to GDB

 Thread 25 "qemu-event" hit Breakpoint 2, 0x00007f6179ed20a7 in processDeviceDeletedEvent (devAlias=<optimized out>, vm=0x7f61842f1020, driver=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:3536

Add breakpoint to domain status XML save, and continue the thread above

 b virDomainObjSave
 t 25
 c

 Thread 25 "qemu-event" hit Breakpoint 3, virDomainObjSave (obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879

Check the backtrace of the domain status XML save function, coming from device deleted event

 (gdb) bt
 #0 virDomainObjSave (obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879
 #1 0x00007f6179eb68c3 in qemuDomainObjSaveStatus (driver=0x7f6184035e20, obj=0x7f61842f1020) at ../../src/qemu/qemu_domain.c:5801
 #2 0x00007f6179ed2159 in processDeviceDeletedEvent (devAlias=0x7f617c0073e0 "input0", vm=0x7f61842f1020, driver=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:3557
 #3 qemuProcessEventHandler (data=0x7f617c0072b0, opaque=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:4184
 #4 0x00007f61974fc983 in virThreadPoolWorker (opaque=<optimized out>) at ../../src/util/virthreadpool.c:164
 #5 0x00007f61974fb4d9 in virThreadHelper (data=<optimized out>) at ../../src/util/virthread.c:241
 #6 0x00007f6196e64ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
 #7 0x00007f6196ef6850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Leave the thread at this point

Let's trigger the shutdown path

First, increase the shutdown timer (30 seconds is too fast for me; use 30 minutes)

 (gdb) b virEventAddTimeout

 $ sudo kill $(pidof libvirtd)

 Thread 1 "libvirtd" hit Breakpoint 4, virEventAddTimeout (timeout=30000, cb=0x7f61975bbbc0 <virNetDaemonFinishTimer>, opaque=0x55aec684a020, ff=0x0) at ../../src/util/virevent.c:148

 t 1
 set $rdi = 30 * 60 * 1000

 (gdb) i r $rdi
 rdi 0x1b7740 1800000

Now, skip the qemu driver shutdown wait path, to force the scenario (unexpected) that it allows a race condition:

 b qemuStateShutdownWait
 c

 Thread 26 "daemon-shutdown" hit Breakpoint 5, qemuStateShutdownWait () at ../../src/qemu/qemu_driver.c:1055

 t 26
 set $rax = 0

 (gdb) i r $rax
 rax 0x0 0

 ret
 c

 Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at ../../src/qemu/qemu_driver.c:1070

Check there are 2 threads: cleanup and domain status XML save

 (gdb) i th
   Id Target Id Frame
   1 Thread 0x7f6193934ac0 (LWP 2544) "libvirtd" qemuStateCleanup () at ../../src/qemu/qemu_driver.c:1070
   18 Thread 0x7f616a7fc640 (LWP 2563) "gmain" (running)
   19 Thread 0x7f6169ffb640 (LWP 2564) "gdbus" (running)
   20 Thread 0x7f61697fa640 (LWP 2565) "udev-event" (running)
   24 Thread 0x7f616affd640 (LWP 2641) "vm-test-vm" (running)
   25 Thread 0x7f61687f8640 (LWP 2660) "qemu-event" virDomainObjSave (obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879

Confirm the qemu driver's domain xml formatter/options is set/referenced:

 t 25

 (gdb) p xmlopt.privateData.format
 $1 = (virDomainXMLPrivateDataFormatFunc) 0x7f6179eb1da0 <qemuDomainObjPrivateXMLFormat>

 (gdb) p xmlopt.parent.parent_instance
 $2 = {g_type_instance = {g_class = 0x7f6184053290}, ref_count = 1, qdata = 0x0}

Let the cleanup function and shutdown path finish

 t 1
 c &

Check the formatter/options again; it is *NO* longer referenced:

 t 25

 (gdb) p xmlopt.privateData.format
 $3 = (virDomainXMLPrivateDataFormatFunc) 0x7f6179eb1da0 <qemuDomainObjPrivateXMLFormat>

 (gdb) p xmlopt.parent.parent_instance
 $4 = {g_type_instance = {g_class = 0x0}, ref_count = 0, qdata = 0x0}

The object data is _not_ zeroed in the last unreference anymore
in Jammy as it is Focal, but it might happen, as this is really
an use-after-free (and another thread might get/use that memory).

So, let's simulate that.

 set xmlopt.privateData.format = 0

 (gdb) p xmlopt.privateData.format
 $5 = (virDomainXMLPrivateDataFormatFunc) 0x0

Check the VM status XML *before* the save function finishes:

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='2638'>
   <monitor path='/var/lib/libvirt/qemu/domain-1-test-vm/monitor.sock' type='unix'/>
   <domain type='qemu' id='1'>

Let the save function continue, and libvirt finishes shutting down:

 (gdb) c
 Continuing.
 ...
 [Inferior 1 (process 2544) exited normally]

Check the VM status XML *after*:

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='2638'>
   <domain type='qemu' id='1'>

It no longer has the 'monitor path' tag/field.

Now, the next time libvirtd starts, it fails to parse that XML:

 $ sudo systemctl start libvirtd.service

 $ journalctl -b -u libvirtd.service | tail
 ...
 ... libvirtd[2789]: internal error: no monitor path
 ... libvirtd[2789]: Failed to load config for domain 'test-vm'

And libvirt is not aware of the domain, and cannot manage it:

 $ virsh list
  Id Name State
 --------------------

 $ virsh list --all
  Id Name State
 --------------------------
  - test-vm shut off

Even though it is still running:

 $ pgrep -af qemu-system-x86_64 | cut -d, -f1
 2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,

Stop it manually:

 $ sudo kill $(sudo cat /run/libvirt/qemu/test-vm.pid)
 $ sudo rm /run/libvirt/qemu/test-vm.{xml,pid}