libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libvirt (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Medium
|
Mauricio Faria de Oliveira | ||
Jammy |
Fix Released
|
Medium
|
Mauricio Faria de Oliveira |
Bug Description
[ Impact ]
* If a race condition occurs on libvirtd shutdown,
a QEMU domain status XML (/run/libvirt/
might lose the QEMU-driver specific information,
such as '<monitor path=.../>'.
(The race condition details are in [Other Info].)
* On the next libvirtd startup, the parsing of that
QEMU domain's status XML fails as '<monitor path='
is not found:
$ journalctl -b -u libvirtd.service | tail
...
... libvirtd[2789]: internal error: no monitor path
... libvirtd[2789]: Failed to load config for domain 'test-vm'
* As a result, the domain is not listed in `virsh list`,
and `virsh` commands to it fail.
$ virsh list
Id Name State
-----
* The domain is still running, but libvirt considers
it as shutdown, which might cause conflicts/issues
with higher-level tools (e.g., openstack nova).
$ virsh list --all
Id Name State
-----
- test-vm shut off
$ pgrep -af qemu-system-x86_64 | cut -d, -f1
2638 /usr/bin/
[ Test Plan ]
* (Focal/Jammy) shutdown-
Synthetic reproducer/
* (Focal-only) shutdown-on-init:
Synthetic reproducer/
* On failure, the XML is saved *without* '<monitor path='
and libvirt fails to parse the domain on startup.
The domain is *not* listed in `virsh list`.
* On success, the XML is saved *with* '<monitor path='
and libvirt correctly parses the domain on startup.
The domain is listed in `virsh list`.
* Normal 'restart' testing in comment #5.
* Test packages built successfully in all architectures
with -proposed enabled in Launchpad PPA mfo/lp2059272 [0]
[0] https:/
[ Regression Potential ]
* One patch changes *where* in the libvirt qemu driver's
shutdown path the worker thread pool is stopped/freed:
from _after_ releasing other data to _before_ doing so.
* The other patch (Focal-only) skips the update of the
QEMU domain status XML file during initialization if
libvirt is shutting down. (This is OK since the file
is not going to be used anyway in the current run as
it is shutting down, and it will be updated again in
the next run anyway.)
* Therefore, the potential for regression is limited to
the libvirt qemu driver's shutdown path, and would be
observed when stopping/restarting libvirtd.service.
* The behavior during normal operation is not affected.
[Other Info]
* In Focal, race windows exist if libvirtd shuts down
_after_ initialization and _during_ initialization
(which is unlikely in practice, but it's possible.)
Say, 'shutdown'
* In Jammy, only 'shutdown-
due to the introduction of the '.stateShutdown
driver callback (not available in Focal), which
indirectly prevents the 'shutdown-on-init' race
due to additional synchronization with locking.
* For 'shutdown-
It's needed in Focal and Jammy (included in Mantic).
* For 'shutdown-on-init' (Focal-only), we should use a
downstream-only patch (with conservative behavior),
since upstream addressed this issue indirectly with
the '.stateShutdown
(which are not SRU material, ~10 patches, redesign [2])
in 6.8.0.
[1] https:/
$ git describe --contains 152770333449cd3
v9.3.0-rc1~90
$ rmadison -a source libvirt | sed -n '/focal/,$p'
libvirt | 6.0.0-0ubuntu8 | focal | source
libvirt | 6.0.0-0ubuntu8.16 | focal-security | source
libvirt | 6.0.0-0ubuntu8.16 | focal-updates | source
libvirt | 6.0.0-0ubuntu8.17 | focal-proposed | source
libvirt | 8.0.0-1ubuntu7 | jammy | source
libvirt | 8.0.0-1ubuntu7.5 | jammy-security | source
libvirt | 8.0.0-1ubuntu7.8 | jammy-updates | source
libvirt | 9.6.0-1ubuntu1 | mantic | source
libvirt | 10.0.0-2ubuntu1 | noble | source
libvirt | 10.0.0-2ubuntu5 | noble-proposed | source
[2] https:/
[PATCH 00/10] resolve hangs/crashes on libvirtd shutdown
commit 94e45d1042e21e0
Author: Nikolay Shirokovskiy <email address hidden>
Date: Thu Jul 23 09:53:04 2020 +0300
rpc: finish all threads before exiting main loop
$ git describe --contains 94e45d1042e21e0
v6.8.0-rc1~279
[Original Description]
There's a race condition on libvirtd shutdown
that might cause the domain status XML file(s)
to lose the '<monitor path=...'> tag/field.
This causes an error on libvirtd startup, and
the domain is not listed/managed, despite it
is still running.
$ virsh list
Id Name State
------
1 test-vm running
$ sudo systemctl restart libvirtd.service
$ journalctl -b -u libvirtd.service | tail
...
... libvirtd[2789]: internal error: no monitor path
... libvirtd[2789]: Failed to load config for domain 'test-vm'
$ virsh list
Id Name State
------
$ virsh list --all
Id Name State
------
- test-vm shut off
$ pgrep -af qemu-system-x86_64 | cut -d, -f1
2638 /usr/bin/
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: |
added: verification-failed-focal removed: verification-needed-focal |
description: | updated |
Steps to reproduce on Jammy
---
Stop libvirt systemd units
sudo systemctl stop 'libvirtd*'
Start libvirt in GDB
sudo gdb \ /debuginfod. ubuntu. com' \ x86_64- linux-gnu/ libvirt. so.0' \ x86_64- linux-gnu/ libvirt- qemu.so. 0' \ x86_64- linux-gnu/ libvirt/ connection- driver/ libvirt_ driver_ qemu.so' \ sbin/libvirtd
-iex 'set confirm off' \
-iex 'set pagination off' \
-iex 'set debuginfod enabled on' \
-iex 'set debuginfod urls https:/
-ex 'set non-stop on' \
-ex 'handle SIGTERM nostop noprint pass' \
-ex 'add-symbol-file /usr/sbin/libvirtd' \
-ex 'add-symbol-file /usr/lib/
-ex 'add-symbol-file /usr/lib/
-ex 'add-symbol-file /usr/lib/
/usr/
Add breakpoints for qemu driver cleanup and device deleted event
b qemuStateCleanup letedEvent
b processDeviceDe
run
Start test VM with an USB mouse device
cat <<-EOF >test-vm.xml test-vm< /name> hvm</type> >32</memory>
<domain type='qemu'>
<name>
<os>
<type>
</os>
<memory unit='MiB'
<vcpu>1</vcpu>
<devices>
<input type='mouse' bus='usb'/>
</devices>
</domain>
EOF
virsh define test-vm.xml
virsh start test-vm
$ virsh list ------- ------- -----
Id Name State
------
1 test-vm running
Delete the USB mouse device
DEVICE_ID=$(virsh qemu-monitor- command test-vm --hmp 'info qtree' | grep 'dev: usb-mouse' | cut -d'"' -f2) command test-vm --hmp "device_del $DEVICE_ID"
virsh qemu-monitor-
Back to GDB
Thread 25 "qemu-event" hit Breakpoint 2, 0x00007f6179ed20a7 in processDeviceDe letedEvent (devAlias= <optimized out>, vm=0x7f61842f1020, driver= 0x7f6184035e20) at ../../src/ qemu/qemu_ driver. c:3536
Add breakpoint to domain status XML save, and continue the thread above
b virDomainObjSave
t 25
c
Thread 25 "qemu-event" hit Breakpoint 3, virDomainObjSave (obj=0x7f61842f 1020, xmlopt= 0x7f6184028010, statusDir= 0x7f6184035460 "/run/libvirt/ qemu") at ../../src/ conf/domain_ conf.c: 28879
Check the backtrace of the domain status XML save function, coming from device deleted event
(gdb) bt 1020, xmlopt= 0x7f6184028010, statusDir= 0x7f6184035460 "/run/libvirt/ qemu") at ../../src/ conf/domain_ conf.c: 28879 veStatus (driver= 0x7f6184035e20, obj=0x7f61842f1020) at ../../src/ qemu/qemu_ domain. c:5801 letedEvent (devAlias= 0x7f617c0073e0 "input0", vm=0x7f61842f1020, driver= 0x7f6184035e20) at ../../src/ qemu/qemu_ driver. c:3557 tHandler (data=0x7f617c0 072b0, opaque= 0x7f6184035e20) at ../../src/ qemu/qemu_ driver. c:4184 util/virthreadp ool.c:164 util/virthread. c:241 pthread_ create. c:442 unix/sysv/ linux/x86_ 64/clone3. S:81
#0 virDomainObjSave (obj=0x7f61842f
#1 0x00007f6179eb68c3 in qemuDomainObjSa
#2 0x00007f6179ed2159 in processDeviceDe
#3 qemuProcessEven
#4 0x00007f61974fc983 in virThreadPoolWorker (opaque=<optimized out>) at ../../src/
#5 0x00007f61974fb4d9 in virThreadHelper (data=<optimized out>) at ../../src/
#6 0x00007f6196e64ac3 in start_thread (arg=<optimized out>) at ./nptl/
#7 0x00007f6196ef6850 in clone3 () at ../sysdeps/
Leave the thread at this point
Let's trigger the shutdown path
First, increase the shutdown timer (30 seconds is too fast for me; use 30 minutes)
(gdb) b virEventAddTimeout
$ sudo kill $(pidof libvirtd)
Thread 1 "libvirtd"...