Comment 26 for bug 2059272

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Steps with test packages on Focal (shutdown-on-runtime)
---

Stop libvirtd systemd units

  sudo systemctl stop 'libvirtd*'

Start libvirt in GDB

  sudo gdb \
    -iex 'set confirm off' \
    -iex 'set pagination off' \
    -ex 'set non-stop on' \
    -ex 'handle SIGTERM nostop noprint pass' \
    -ex 'add-symbol-file /usr/sbin/libvirtd' \
    -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
    -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
    -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
    /usr/sbin/libvirtd

Add breakpoints for qemu driver cleanup and device deleted event

  b qemuStateCleanup
  b processDeviceDeletedEvent
  run

Start test VM with an USB mouse device

   cat <<-EOF >test-vm.xml
   <domain type='qemu'>
     <name>test-vm</name>
     <os>
       <type>hvm</type>
     </os>
     <memory unit='MiB'>32</memory>
     <vcpu>1</vcpu>
     <devices>
       <input type='mouse' bus='usb'/>
     </devices>
   </domain>
 EOF

  virsh define test-vm.xml
  virsh start test-vm

  $ virsh list
  Id Name State
  -------------------------
  1 test-vm running

Delete the USB mouse device

  DEVICE_ID=$(virsh qemu-monitor-command test-vm --hmp 'info qtree' | grep 'dev: usb-mouse' | cut -d'"' -f2)
  virsh qemu-monitor-command test-vm --hmp "device_del $DEVICE_ID"

Back to GDB

  Thread 20 "libvirtd" hit Breakpoint 2, 0x00007ffba902204e in processDeviceDeletedEvent (devAlias=<optimized out>, vm=0x7ffbac00de90, driver=0x7ffbac021380) at ../../../src/qemu/qemu_driver.c:4888

Add breakpoint to domain status XML save, and continue the thread above

  b virDomainObjSave
  t 20
  c

 Thread 20 "libvirtd" hit Breakpoint 3, virDomainObjSave (obj=0x7ffbac00de90, xmlopt=0x7ffbac044130, statusDir=0x7ffbac01f530 "/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157

Check the backtrace of the domain status XML save function, coming from device deleted event

  (gdb) bt
 #0 virDomainObjSave (obj=0x7ffbac00de90, xmlopt=0x7ffbac044130, statusDir=0x7ffbac01f530 "/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157
 #1 0x00007ffba9022127 in processDeviceDeletedEvent (devAlias=0x556074b5e3f0 "input0", vm=0x7ffbac00de90, driver=0x7ffbac021380) at ../../../src/qemu/qemu_driver.c:4312
 #2 qemuProcessEventHandler (data=0x556074b63a10, opaque=0x7ffbac021380) at ../../../src/qemu/qemu_driver.c:4888
 #3 0x00007ffbbee8f1af in virThreadPoolWorker (opaque=opaque@entry=0x556074c047a0) at ../../../src/util/virthreadpool.c:163
 #4 0x00007ffbbee8e51c in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:196
 #5 0x00007ffbbeb4f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #6 0x00007ffbbea74353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Leave the thread at this point

Let's trigger the shutdown path

  $ sudo kill $(pidof libvirtd)

 Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1127

Check the function pointer is non-NULL _before_ cleanup

 (gdb) p xmlopt.privateData.format
 $1 = (virDomainXMLPrivateDataFormatFunc) 0x7ffba8f7c7c0 <qemuDomainObjPrivateXMLFormat>

 (gdb) p/x xmlopt.parent
 $2 = {u = {dummy_align1 = 0x1cafe0027, dummy_align2 = 0x1cafe0027, s = {magic = 0xcafe0027, refs = 0x1}}, klass = 0x7ffbac044100}

Let cleanup run:

 t 1
 c &

Check the formatter/options again; it is *STILL* referenced, not 0x0 anymore:

 (gdb) p xmlopt.privateData.format
 $3 = (virDomainXMLPrivateDataFormatFunc) 0x7ffba8f7c7c0 <qemuDomainObjPrivateXMLFormat>

 (gdb) p/x xmlopt.parent
 $4 = {u = {dummy_align1 = 0x1cafe0027, dummy_align2 = 0x1cafe0027, s = {magic = 0xcafe0027, refs = 0x1}}, klass = 0x7ffbac044100}

Check the shutdown/cleanup thread is waiting for it,
in the path to free the worker thread pool:

 (gdb) i th 1
   Id Target Id Frame
   1 Thread 0x7ffbbb035b40 (LWP 5887) "libvirtd" (running)
 (gdb) t 1
 (gdb) interrupt
 (gdb) bt
 #0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffbac05fd60) at ../sysdeps/nptl/futex-internal.h:183
 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffbac05fce0, cond=0x7ffbac05fd38) at pthread_cond_wait.c:508
 #2 __pthread_cond_wait (cond=0x7ffbac05fd38, mutex=0x7ffbac05fce0) at pthread_cond_wait.c:647
 #3 0x00007ffbbee8e79b in virCondWait (c=<optimized out>, m=<optimized out>) at ../../../src/util/virthread.c:144
 #4 0x00007ffbbee8f438 in virThreadPoolFree (pool=<optimized out>) at ../../../src/util/virthreadpool.c:286
 #5 0x00007ffba8fed5d1 in qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1131
 #6 0x00007ffbbf02c47f in virStateCleanup () at ../../../src/libvirt.c:669
 #7 0x0000556072acebc8 in main (argc=<optimized out>, argv=<optimized out>) at ../../../src/remote/remote_daemon.c:1447

Let the save function continue, and libvirt finishes shutting down:

  (gdb) c &
  Continuing.
  (gdb) t 20
  (gdb) c
 [Inferior 1 (process 5887) exited normally]
  (gdb) q

Check the VM status XML *after*:

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='5996'>
   <monitor path='/var/lib/libvirt/qemu/domain-1-test-vm/monitor.sock' type='unix'/>
   <domain type='qemu' id='1'>

Now, the next time libvirtd starts, it correctly parses that XML:

  $ sudo systemctl start libvirtd.service

  $ journalctl -b -u libvirtd.service | grep -A1 error
  $

And libvirt is aware of the domain, and can manage it:

  $ virsh list
   Id Name State
  -------------------------
   1 test-vm running

  $ virsh destroy test-vm
  Domain test-vm destroyed

  $ virsh undefine test-vm
 Domain test-vm has been undefined