Comment 13 for bug 2059272

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote (last edit ):

Steps to reproduce on Focal (shutdown-on-init)
---

LXD virtual machine

 lxc launch --vm ubuntu:focal lp2059272-focal
 lxc exec lp2059272-focal -- su - ubuntu

Latest Packages and Debug Symbols:

 cat <<EOF | sudo tee /etc/apt/sources.list.d/proposed.list
 deb http://archive.ubuntu.com/ubuntu focal-proposed main universe
 deb http://ddebs.ubuntu.com focal-proposed main restricted
 EOF

 cat <<EOF | sudo tee /etc/apt/preferences.d/proposed
 Package: *
 Pin: release a=focal-proposed
 Pin-Priority: 400
 EOF

 sudo apt install --yes --no-install-recommends gdb qemu-system-x86 ubuntu-dbgsym-keyring
 sudo apt update
 sudo apt install --yes --no-install-recommends -t focal-proposed libvirt{0,-daemon{,-driver-qemu,-system}}{,-dbgsym} libvirt-clients

 $ dpkg -l | awk '$2 ~ /^libvirt/ { print $2, $3 }'
 libvirt-clients 6.0.0-0ubuntu8.17
 libvirt-daemon 6.0.0-0ubuntu8.17
 libvirt-daemon-dbgsym 6.0.0-0ubuntu8.17
 libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.17
 libvirt-daemon-driver-qemu-dbgsym 6.0.0-0ubuntu8.17
 libvirt-daemon-system 6.0.0-0ubuntu8.17
 libvirt-daemon-system-systemd 6.0.0-0ubuntu8.17
 libvirt0:amd64 6.0.0-0ubuntu8.17
 libvirt0-dbgsym:amd64 6.0.0-0ubuntu8.17

Start test VM

 cat <<-EOF >test-vm.xml
 <domain type='qemu'>
   <name>test-vm</name>
   <os>
     <type>hvm</type>
   </os>
   <memory unit='MiB'>32</memory>
   <vcpu>1</vcpu>
 </domain>
 EOF

 virsh define test-vm.xml
 virsh start test-vm

 $ virsh list
  Id Name State
 -------------------------
  1 test-vm running

Stop libvirt systemd units

 sudo systemctl stop 'libvirtd*'

Start libvirt in GDB

 sudo gdb \
   -iex 'set confirm off' \
   -iex 'set pagination off' \
   -ex 'set non-stop on' \
   -ex 'handle SIGTERM nostop noprint pass' \
   -ex 'add-symbol-file /usr/sbin/libvirtd' \
   -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
   -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
   -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
   /usr/sbin/libvirtd

Add breakpoints for qemu driver cleanup and domain status XML save

 b qemuStateCleanup
 b virDomainObjSave
 run

 Thread 20 "libvirtd" hit Breakpoint 2, virDomainObjSave (obj=0x555cf5d83480, xmlopt=0x555cf5d7e6a0, statusDir=0x555cf5d26e70 "/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157

Check the backtrace of the domain status XML save function, coming from QEMU process reconnect:

 t 20

 (gdb) bt
 #0 virDomainObjSave (obj=0x555cf5d83480, xmlopt=0x555cf5d7e6a0, statusDir=0x555cf5d26e70 "/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157
 #1 0x00007f743b666268 in qemuProcessReconnect (opaque=<optimized out>) at ../../../src/qemu/qemu_process.c:8122
 #2 0x00007f7460b9054a in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:196
 #3 0x00007f7460851609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #4 0x00007f7460776353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Leave the thread at this point

Let's trigger the shutdown path

 $ sudo kill $(pidof libvirtd)

 Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1118

Check there are 2 threads: cleanup and domain status XML save

 (gdb) i th

   Id Target Id Frame
   1 Thread 0x7f745cd37b40 (LWP 8029) "libvirtd" qemuStateCleanup () at ../../../src/qemu/qemu_driver.c:1118
   2 Thread 0x7f745c8a9700 (LWP 8034) "libvirtd" (running)
 ...
   18 Thread 0x7f7417fff700 (LWP 8100) "libvirtd" (running)
 * 20 Thread 0x7f7416ffd700 (LWP 8105) "libvirtd" virDomainObjSave (obj=0x555cf5d83480, xmlopt=0x555cf5d7e6a0, statusDir=0x555cf5d26e70 "/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157

Confirm the qemu driver's domain xml formatter/options is set/referenced:

 t 20

 (gdb) p xmlopt.privateData.format
 $1 = (virDomainXMLPrivateDataFormatFunc) 0x7f743b628810 <qemuDomainObjPrivateXMLFormat>

 (gdb) p xmlopt.parent.u.s.refs
 $2 = 1

 (gdb) p/x xmlopt.parent
 $3 = {u = {dummy_align1 = 0x1cafe0027, dummy_align2 = 0x1cafe0027, s = {magic = 0xcafe0027, refs = 0x1}}, klass = 0x555cf5dbbbb0}

Let the cleanup function finish

 t 1
 finish

Check the formatter/options again; it is now zeroed, and used after freed:

        t 20

 (gdb) p xmlopt.privateData.format
 $5 = (virDomainXMLPrivateDataFormatFunc) 0x0

 (gdb) p xmlopt.parent.u.s.refs
 $6 = 21852

 (gdb) p/x xmlopt.parent
 $7 = {u = {dummy_align1 = 0x555cf5d39800, dummy_align2 = 0x555cf5d39800, s = {magic = 0xf5d39800, refs = 0x555c}}, klass = 0x555cf5d09010}

The object data is zeroed in the last unreference in Focal,
and its contents changed, as this is really an use-after-free
(another thread might get/use that memory).

Check the VM status XML *before* the save function finishes:

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='7932'>
   <monitor path='/var/lib/libvirt/qemu/domain-1-test-vm/monitor.sock' type='unix'/>
   <domain type='qemu' id='1'>

Let the save function continue

 (gdb) c &

Check the VM status XML *after*:

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='7932'>
   <domain type='qemu' id='1'>

It no longer has the 'monitor path' tag/field.

Let libvirt finish shutting down:

 (gdb) t 1
 (gdb) c
 Continuing.
 ...
 [Inferior 1 (process 8029) exited normally]
 (gdb) quit

Now, the next time libvirtd starts, it fails to parse that XML:

 $ sudo systemctl start libvirtd.service

 $ journalctl -b -u libvirtd.service | tail
 ...
 ... libvirtd[8297]: 8313: error : qemuDomainObjPrivateXMLParse:3678 : internal error: no monitor path
 ... libvirtd[8297]: 8313: error : virDomainObjListLoadAllConfigs:632 : Failed to load config for domain 'test-vm'

And libvirt is not aware of the domain, and cannot manage it:

 $ virsh list
  Id Name State
 --------------------

 $ virsh list --all
  Id Name State
 --------------------------
  - test-vm shut off

Even though it is still running:

 $ pgrep -af qemu-system-x86_64 | cut -d, -f1
 7932 /usr/bin/qemu-system-x86_64 -name guest=test-vm

Stop it manually

 $ sudo kill $(pgrep -f qemu-system-x86_64)
 $ sudo rm /run/libvirt/qemu/test-vm.{xml,pid}

Start libvirt:

 sudo systemctl start libvirtd.service