Comment 12 for bug 2059272

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote (last edit ):

Steps to reproduce on Focal (shutdown-on-runtime)
---

LXD virtual machine

 lxc launch --vm ubuntu:focal lp2059272-focal
 lxc exec lp2059272-focal -- su - ubuntu

Latest Packages and Debug Symbols:

 cat <<EOF | sudo tee /etc/apt/sources.list.d/proposed.list
 deb http://archive.ubuntu.com/ubuntu focal-proposed main universe
 deb http://ddebs.ubuntu.com focal-proposed main restricted
 EOF

 cat <<EOF | sudo tee /etc/apt/preferences.d/proposed
 Package: *
 Pin: release a=focal-proposed
 Pin-Priority: 400
 EOF

 sudo apt install --yes --no-install-recommends gdb qemu-system-x86 ubuntu-dbgsym-keyring
 sudo apt install --yes --no-install-recommends -t focal-proposed libvirt{0,-daemon{,-driver-qemu,-system}}{,-dbgsym} libvirt-clients

 $ dpkg -l | awk '$2 ~ /^libvirt/ { print $2, $3 }'
 libvirt-clients 6.0.0-0ubuntu8.17
 libvirt-daemon 6.0.0-0ubuntu8.17
 libvirt-daemon-dbgsym 6.0.0-0ubuntu8.17
 libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.17
 libvirt-daemon-driver-qemu-dbgsym 6.0.0-0ubuntu8.17
 libvirt-daemon-system 6.0.0-0ubuntu8.17
 libvirt-daemon-system-systemd 6.0.0-0ubuntu8.17
 libvirt0:amd64 6.0.0-0ubuntu8.17
 libvirt0-dbgsym:amd64 6.0.0-0ubuntu8.17

Stop libvirt systemd units

 sudo systemctl stop 'libvirtd*'

Start libvirt in GDB

  sudo gdb \
    -iex 'set confirm off' \
    -iex 'set pagination off' \
    -ex 'set non-stop on' \
    -ex 'handle SIGTERM nostop noprint pass' \
    -ex 'add-symbol-file /usr/sbin/libvirtd' \
    -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
    -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
    -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
    /usr/sbin/libvirtd

Add breakpoints for qemu driver cleanup and device deleted event

 b qemuStateCleanup
 b processDeviceDeletedEvent
 run

Start test VM with an USB mouse device

  cat <<-EOF >test-vm.xml
  <domain type='qemu'>
    <name>test-vm</name>
    <os>
      <type>hvm</type>
    </os>
    <memory unit='MiB'>32</memory>
    <vcpu>1</vcpu>
    <devices>
      <input type='mouse' bus='usb'/>
    </devices>
  </domain>
EOF

 virsh define test-vm.xml
 virsh start test-vm

 $ virsh list
 Id Name State
 -------------------------
 1 test-vm running

Delete the USB mouse device

 DEVICE_ID=$(virsh qemu-monitor-command test-vm --hmp 'info qtree' | grep 'dev: usb-mouse' | cut -d'"' -f2)
 virsh qemu-monitor-command test-vm --hmp "device_del $DEVICE_ID"

Back to GDB

 Thread 20 "libvirtd" hit Breakpoint 2, 0x00007f316df78ffe in processDeviceDeletedEvent (devAlias=<optimized out>, vm=0x7f3178016890, driver=0x7f317802edd0) at ../../../src/qemu/qemu_driver.c:4879

Add breakpoint to domain status XML save, and continue the thread above

 b virDomainObjSave
 t 20
 c

 Thread 20 "libvirtd" hit Breakpoint 3, virDomainObjSave (obj=0x7f3178016890, xmlopt=0x7f3178037780, statusDir=0x7f317800aa40 "/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157

Check the backtrace of the domain status XML save function, coming from device deleted event

 (gdb) bt
 #0 virDomainObjSave (obj=0x7f3178016890, xmlopt=0x7f3178037780, statusDir=0x7f317800aa40 "/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157
 #1 0x00007f316df790d7 in processDeviceDeletedEvent (devAlias=0x5654c5839310 "input0", vm=0x7f3178016890, driver=0x7f317802edd0) at ../../../src/qemu/qemu_driver.c:4303
 #2 qemuProcessEventHandler (data=0x5654c5879e80, opaque=0x7f317802edd0) at ../../../src/qemu/qemu_driver.c:4879
 #3 0x00007f318b59f1af in virThreadPoolWorker (opaque=opaque@entry=0x5654c5838160) at ../../../src/util/virthreadpool.c:163
 #4 0x00007f318b59e51c in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:196
 #5 0x00007f318b25f609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
 #6 0x00007f318b184353 in clone () from /lib/x86_64-linux-gnu/libc.so.6

Leave the thread at this point

Let's trigger the shutdown path

 $ sudo kill $(pidof libvirtd)

Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1118

Check there are 2 stopped threads: cleanup and domain status XML save

 (gdb) i th
   Id Target Id Frame
   1 Thread 0x7f3187745b40 (LWP 4969) "libvirtd" qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1118
   2 Thread 0x7f31872af700 (LWP 4973) "libvirtd" (running)
 ...
   18 Thread 0x7f315e7fc700 (LWP 4989) "libvirtd" (running)
 * 20 Thread 0x7f315dffb700 (LWP 5045) "libvirtd" virDomainObjSave (obj=0x7f3178016890, xmlopt=0x7f3178037780, statusDir=0x7f317800aa40 "/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157

Confirm the qemu driver's domain xml formatter/options is set/referenced:

 t 20

 (gdb) p xmlopt.privateData.format
 $1 = (virDomainXMLPrivateDataFormatFunc) 0x7f316ded3810 <qemuDomainObjPrivateXMLFormat>

 (gdb) p/x xmlopt.parent
 $3 = {u = {dummy_align1 = 0x1cafe0026, dummy_align2 = 0x1cafe0026, s = {magic = 0xcafe0026, refs = 0x1}}, klass = 0x7f3178037750}

Let the cleanup function and shutdown path finish

 t 1
 c &

Check the formatter/options again; it is *NO* longer referenced:

 t 20

 (gdb) p xmlopt.privateData.format
 $4 = (virDomainXMLPrivateDataFormatFunc) 0x0

 (gdb) p/x xmlopt.parent
 $5 = {u = {dummy_align1 = 0x7f317805d170, dummy_align2 = 0x7f317805d170, s = {magic = 0x7805d170, refs = 0x7f31}}, klass = 0x7f3178000080}

The object data is zeroed in the last unreference in Focal,
and its contents changed, as this is really an use-after-free
(another thread might get/use that memory).

Check the VM status XML *before* the save function finishes:

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='5031'>
   <monitor path='/var/lib/libvirt/qemu/domain-1-test-vm/monitor.sock' type='unix'/>
   <domain type='qemu' id='1'>

Let the save function continue, and libvirt finishes shutting down:

 (gdb) c
 Continuing.
 ...
 [Inferior 1 (process 4969) exited normally]

 (gdb) q

Check the VM status XML *after*:

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='5031'>
   <domain type='qemu' id='1'>

It no longer has the 'monitor path' tag/field.

Now, the next time libvirtd starts, it fails to parse that XML:

 $ sudo systemctl start libvirtd.service

 $ journalctl -b -u libvirtd.service | grep error

 $ journalctl -b -u libvirtd.service | grep -A1 error
 Mar 30 21:58:37 lp2059272-focal libvirtd[5063]: internal error: no monitor path
 Mar 30 21:58:37 lp2059272-focal libvirtd[5063]: Failed to load config for domain 'test-vm'

And libvirt is not aware of the domain, and cannot manage it:

 $ virsh list
 Id Name State
 --------------------

 $ virsh list --all
 Id Name State
 --------------------------
 - test-vm shut off

Even though it is still running:

  $ pgrep -af qemu-system-x86_64 | cut -d, -f1
  5031 /usr/bin/qemu-system-x86_64 -name guest=test-vm

Stop it manually:

  $ sudo kill $(sudo cat /run/libvirt/qemu/test-vm.pid)
  $ sudo rm /run/libvirt/qemu/test-vm.{xml,pid}

Start libvirt:

 sudo systemctl start libvirtd.service