Comment 32 for bug 2059272

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Verification done on focal-proposed, following comments 23, 24, 25, 26.

Including in this comment a few key snippets from each test/comment.

---
Environment
---

LXD virtual machine

 lxc launch --vm ubuntu:focal lp2059272-focal
 lxc exec lp2059272-focal -- su - ubuntu

Enable proposed & debug symbols

 cat <<EOF | sudo tee /etc/apt/sources.list.d/proposed.list
 deb http://archive.ubuntu.com/ubuntu focal-proposed main universe
 deb http://ddebs.ubuntu.com focal-proposed main universe
 EOF

 cat <<EOF | sudo tee /etc/apt/preferences.d/proposed
 Package: *
 Pin: release a=focal-proposed
 Pin-Priority: 400
 EOF

 sudo apt install --yes --no-install-recommends gdb qemu-system-x86 ubuntu-dbgsym-keyring
 sudo apt update
 sudo apt install --yes --no-install-recommends -t focal-proposed libvirt{0,-daemon{,-driver-qemu,-system}}{,-dbgsym} libvirt-clients

 $ apt-cache policy libvirt-daemon-driver-qemu
 libvirt-daemon-driver-qemu:
   Installed: 6.0.0-0ubuntu8.20
   Candidate: 6.0.0-0ubuntu8.20
   Version table:
  *** 6.0.0-0ubuntu8.20 400
  400 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 Packages
  100 /var/lib/dpkg/status
      6.0.0-0ubuntu8.19 500
  500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
  500 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages
      6.0.0-0ubuntu8 500
  500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages

 newgrp libvirt # or logout/login

Libvirtd debug logging

 cat <<-EOF | sudo tee -a /etc/libvirt/libvirtd.conf
 log_filters="1:qemu 1:libvirt"
 log_outputs="3:syslog:libvirtd 1:file:/var/log/libvirt/libvirtd-debug.log"
 EOF

---
Steps with test packages on Focal (normal restarts)
---

 <...>
 for SLEEP in $(seq 0.1 0.1 2.0); do
 <...>

All VMs are still managed by libvirt:

 $ virsh list
  Id Name State
 ----------------------------
  1 test-vm-1 running
  2 test-vm-2 running
  3 test-vm-3 running
  4 test-vm-4 running
  5 test-vm-5 running
  6 test-vm-6 running
  7 test-vm-7 running
  8 test-vm-8 running
  9 test-vm-9 running
  10 test-vm-10 running

---
Steps with test packages on Focal (shutdown-on-init)
---

Scenario 1) Shutdown wins race against XML update (ie, shutdown happens first)

<...>

Now, let the qemuProcessReconnect thread continue, it will not update the XML file,
because 'quit' is set (ie, shutdown in progress)

 (gdb) t 20
 (gdb) p ((virNetDaemonPtr)anyobj)->quit
 $2 = true

 $ ls -l /run/libvirt/qemu/test-vm.xml
 -rw------- 1 root root 10189 Apr 24 12:02 /run/libvirt/qemu/test-vm.xml

 (gdb) c &

 $ ls -l /run/libvirt/qemu/test-vm.xml
 -rw------- 1 root root 10189 Apr 24 12:02 /run/libvirt/qemu/test-vm.xml

 <...>

 $ sudo grep 'Leaving the update of .* domain status XML' /var/log/libvirt/libvirtd-debug.log
 2024-04-24 12:08:40.054+0000: 3770: info : qemuProcessReconnect:8157 : Leaving the update of 'test-vm' domain status XML for the next initialization (shutdown detected on this initialization).

 <...>

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='3726'>
   <monitor path='/var/lib/libvirt/qemu/domain-1-test-vm/monitor.sock' type='unix'/>
   <domain type='qemu' id='1'>

Scenario 2) Shutdown loses race against XML update (ie, update happens first)

<...>

Instead, let the qemuProcessReconnect thread take the lock, and update the XML file, but not unlock yet

 <...>

 $ ls -l /run/libvirt/qemu/test-vm.xml
 -rw------- 1 root root 10189 Apr 24 12:02 /run/libvirt/qemu/test-vm.xml

 (gdb) b virObjectUnlock thread 20 if anyobj == $ptr
 (gdb) c

 $ ls -l /run/libvirt/qemu/test-vm.xml
 -rw------- 1 root root 10189 Apr 24 12:14 /run/libvirt/qemu/test-vm.xml

 <...>

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='3726'>
   <monitor path='/var/lib/libvirt/qemu/domain-1-test-vm/monitor.sock' type='unix'/>
   <domain type='qemu' id='1'>

Scenario 3) Shutdown happens along QEMU monitor calls (ie, calls don't finish)

<...>

 The XML was not updated, as expected:

 $ ls -l /run/libvirt/qemu/test-vm.xml
 -rw------- 1 root root 10189 Apr 24 12:14 /run/libvirt/qemu/test-vm.xml

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='3726'>
   <monitor path='/var/lib/libvirt/qemu/domain-1-test-vm/monitor.sock' type='unix'/>
   <domain type='qemu' id='1'>
<...>

Now, the next time libvirtd starts, it correctly parses that XML:

  $ sudo systemctl start libvirtd.service

  $ journalctl -b -u libvirtd.service | grep -A1 error
  $

And libvirt is aware of the domain, and can manage it:

 $ virsh list
  Id Name State
 -------------------------
  1 test-vm running

 $ virsh destroy test-vm
 Domain test-vm destroyed

 $ virsh undefine test-vm
 Domain test-vm has been undefined

---
Steps with test packages on Focal (shutdown-on-runtime)
---

<...>
Check the formatter/options again; it is *STILL* referenced, not 0x0 anymore:

 (gdb) t 20
 (gdb) p xmlopt.privateData.format
 $3 = (virDomainXMLPrivateDataFormatFunc) 0x7fd08c3437c0 <qemuDomainObjPrivateXMLFormat>
 (gdb) p/x xmlopt.parent
 $4 = {u = {dummy_align1 = 0x1cafe0026, dummy_align2 = 0x1cafe0026, s = {magic = 0xcafe0026, refs = 0x1}}, klass = 0x7fd080043170}

Let the save function continue, and libvirt finishes shutting down:
<...>
Check the VM status XML *after*:

 $ ls -l /run/libvirt/qemu/test-vm.xml
 -rw------- 1 root root 10251 Apr 24 12:28 /run/libvirt/qemu/test-vm.xml

 $ sudo grep -e '<domstatus' -e '<domain' -e 'monitor path' /run/libvirt/qemu/test-vm.xml
 <domstatus state='running' reason='booted' pid='4055'>
   <monitor path='/var/lib/libvirt/qemu/domain-1-test-vm/monitor.sock' type='unix'/>
   <domain type='qemu' id='1'>

Now, the next time libvirtd starts, it correctly parses that XML:

 $ sudo systemctl start libvirtd.service

 $ journalctl -b -u libvirtd.service | grep -A1 error
 $

And libvirt is aware of the domain, and can manage it:

 $ virsh list
 Id Name State
 -------------------------
 1 test-vm running

 $ virsh destroy test-vm
 Domain test-vm destroyed

 $ virsh undefine test-vm
 Domain test-vm has been undefined