libvirt-guests.sh fails to shutdown guests in parallel
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libvirt |
Fix Released
|
Undecided
|
|||
libvirt (Ubuntu) |
Fix Released
|
Medium
|
Christian Ehrhardt | ||
Xenial |
Fix Released
|
Medium
|
Jorge Niedbalski | ||
Zesty |
Won't Fix
|
Medium
|
Jorge Niedbalski | ||
Artful |
Fix Released
|
Medium
|
Jorge Niedbalski |
Bug Description
[Environment]
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
[Impact]
There is a bug/race condition on libvirt-
The critical chain for this service is:
libvirt-
└─libvirt-
└─remote-
└─remote-
└
As an example, I have the following kvm host with 42 virtual
machines.
ubuntu@
Id Name State
-------
12 locked-trusty-2 running
13 locked-trusty-3 running
[...]
41 locked-trusty-42 running
After rebooting the machine:
[ 250.999516] libvirt-
[ 251.011367] libvirt-
[ 251.027072] libvirt-
[...]
[ 391.949941] libvirt-
[ 398.074405] libvirt-
[ 403.020479] libvirt-
[ OK ] Stopped Suspend Active Libvirt Guests.
[ OK ] Stopped target System Time Synchronized.
[Test Case]
* Make sure the following variables are set in /etc/default/
ON_SHUTDOWN=
PARALLEL_
SHUTDOWN_
* Create over 20 virtual machines (in my case, using uvt-kvm).
$ for f in $(seq 0 40); do uvt-kvm create --memory 2000 --cpu 1 locked-trusty-$f release=xenial arch=amd64 ; done
* Reboot the machine and monitor the systemd service stop sequence
or console output.
(With systemd: systemctl start debug-shell and jumpt to ctrl+alt+f9)
* Error message "Timeout expired while shutting down domains" should
be displayed.
[Regression Potential]
* None identified.
[Other Info]
* There is a proposed patch in upstream already that has been already
linked to this bug: https:/
tags: | added: patch |
Changed in libvirt: | |
importance: | Unknown → Undecided |
status: | Unknown → In Progress |
Changed in libvirt (Ubuntu): | |
status: | Triaged → Confirmed |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
Changed in libvirt (Ubuntu Xenial): | |
status: | New → In Progress |
Changed in libvirt (Ubuntu Zesty): | |
status: | New → In Progress |
Changed in libvirt (Ubuntu Artful): | |
status: | New → In Progress |
Changed in libvirt (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Changed in libvirt (Ubuntu Zesty): | |
importance: | Undecided → Medium |
Changed in libvirt (Ubuntu Artful): | |
importance: | Undecided → Medium |
Changed in libvirt (Ubuntu Xenial): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
Changed in libvirt (Ubuntu Zesty): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
Changed in libvirt (Ubuntu Artful): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
description: | updated |
description: | updated |
Changed in libvirt: | |
status: | In Progress → Fix Released |
Changed in libvirt (Ubuntu Zesty): | |
status: | In Progress → Won't Fix |
tags: | added: sts-sponsor-ddstreet |
tags: | removed: sts-sponsor-ddstreet |
Hi Christoph and thanks for your report and thereby help to make Ubuntu better!
The default value for this is PARALLEL_ SHUTDOWN= 10 so everybody would run into this issue.
I assume that there needs to be more to this than just "broken in general", so let us try to find what it is that makes this fail for you.
These scripts weren't touched a long time as they just used to work so far. guests_ shutdown" is on a different line (353) then.
I wondered that for me "check_
That might just be a type or such, but to be sure could you check with verify if the package thinks the file is non default (after you remove your modification of course): libvirt/ libvirt- guests. sh 92f0313c1c2c639 aa /usr/lib/ libvirt/ libvirt- guests. sh
dpkg --verify libvirt-bin
I checked the md5 of the file in the version you referred to which is:
$ md5sum /usr/lib/
611e4b358943291
Never the less I found the issue you are describing: $(check_ guests_ shutdown "$uri" "$on_shutdown")
The assignment is:
444: on_shutdown=
The report of the translated message it like:
361: eval_gettext "Failed to determine state of guest: \$guest. Not tracking it anymore."
While certainly broken and needing a fix this should at least still time out for you after the default of 2 minutes right?
You could lessen the timeout as the most convenient until a proper fix is there then.
Also the issue only occurs if function guest_is_on fails (so neither detected run, nor not running, but really failing). Eventually that executes:
$ virsh domname <uuid>
That should also fail in your case to trigger the issue - is there any obvious reason you'd know why that fails for you? The output of this should also be mixed into the result in your case, so maybe you find it there.
But while it is interesting to understand why this is triggering for you it is an issue none-the-less