guest cleanup script fails to iterate
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libvirt (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned | ||
Artful |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
* Due to a bug in a recent fix guests now are immediately considered shut
down. Due to that the helper to shut down guests cleanly might not wait
long enough.
* backporting the upstream fix to the affected releases.
[Test Case]
* Spawn a few KVM guests through libvirt, e.g.
$ uvt-simplestrea
$ uvt-kvm create --memory 128 --password ubuntu b1 arch=amd64 release=bionic label=daily
$ uvt-kvm create --memory 128 --password ubuntu b2 arch=amd64 release=bionic label=daily
$ uvt-kvm create --memory 128 --password ubuntu b3 arch=amd64 release=bionic label=daily
* A) Shut down the computer and watch how it handles guest shutdown.
* B) You can better track that if instead of a shutdown you call what the
shutdown would via:
$ sudo /usr/lib/
* It will consider guests as shut down no matter what, so you might want
to modify guests to not shut down or if you have a system to do so use
huge guests with plenty of dirty memory which will slow it down.
One good trick is to start the guests right before stopping.
Guests are not yet responding to shutdown signals at that time, so the
issue is revealed (not waiting on shutdown)
To do so you could stop the guests and then do:
$ virsh start b1; virsh start b2; virsh start b3;
$ /usr/lib/
The bad case looks like:
Running guests on default URI: b1, b2, b3
Shutting down guests on default URI...
Starting shutdown on guest: b1
Starting shutdown on guest: b2
Starting shutdown on guest: b3
Waiting for 3 guests to shut down, 120 seconds left
Shutdown of guest b1 complete.
Shutdown of guest b2 complete.
Shutdown of guest b3 complete.
But you can check with virsh list they are still alive.
The improved case at least properly waits like:
...
Waiting for 3 guests to shut down, 120 seconds left
Waiting for 3 guests to shut down, 115 seconds left
Waiting for 3 guests to shut down, 110 seconds left
...
[Regression Potential]
* The fix "only" affects this shutdown handling and no other part of
libvirt which limits the ranges of potential regressions.
* The current issue is that we don't wait enough for the geusts, we can't
wait "less" so we won't regress to be less forgiving slow-shutdowns,
but if a mistake would be in the code the regression risk would
be to slow down shutdowns - tests did not see that, but this section is
about regression POTENTIAL so I thought I add it.
[Other Info]
* n/a
---
The recent fix to libvirt-
(variable scope) but failed to adapt the loop in check_guests_
Due to that it currently detects all guests as "Failed to determine state of guest".
It all works, but guests are just assumed dead and logs are spilled with false-positive warnings.
A suggested fix is at: https:/
description: | updated |
Without the fix they were in my case always detected as immediately shut down which likely is wrong and can end in non-clean shutdown even it would have been clean if the guests had some more time.
This was due to the fact that it picked up the empty external var to loop over and then the returned list was obviously empty.
Tests with the fix:
Tested guests not going away (just kill them so early that they do not respond to shutdown yet)
=> ok (took configured max time)
Running guests on default URI: b1, b2, b3
Shutting down guests on default URI...
Starting shutdown on guest: b1
Starting shutdown on guest: b2
Starting shutdown on guest: b3
Waiting for 3 guests to shut down, 120 seconds left
Waiting for 3 guests to shut down, 115 seconds left
Waiting for 3 guests to shut down, 110 seconds left
Waiting for 3 guests to shut down, 105 seconds left
...
Waiting for 3 guests to shut down, 5 seconds left
Timeout expired while shutting down domains
Tested guests going away one by one (just kill them so early that they do not respond to shutdown yet) - and then virsh destroy them over time
=> ok (detected one by one as they should be)
Running guests on default URI: b1, b2, b3
Shutting down guests on default URI...
Starting shutdown on guest: b1
Starting shutdown on guest: b2
Starting shutdown on guest: b3
Waiting for 3 guests to shut down, 120 seconds left
Waiting for 3 guests to shut down, 115 seconds left
Shutdown of guest b2 complete.
Waiting for 2 guests to shut down, 110 seconds left
Waiting for 2 guests to shut down, 105 seconds left
Shutdown of guest b3 complete.
Waiting for 1 guests to shut down, 100 seconds left
Waiting for 1 guests to shut down, 95 seconds left
Waiting for 1 guests to shut down, 90 seconds left
Shutdown of guest b1 complete.
Tested all guests going away immediately (normal guests)
=> ok, detected all in one "loop"
Running guests on default URI: b1, b2, b3
Shutting down guests on default URI...
Starting shutdown on guest: b1
Starting shutdown on guest: b2
Starting shutdown on guest: b3
Waiting for 3 guests to shut down, 120 seconds left
Shutdown of guest b1 complete.
Shutdown of guest b2 complete.
Shutdown of guest b3 complete.
Tested parallel=2 with guests going away one by one (but not all eventually).
=> ok, one down kicked next shutdown
=> ok partial looping did still work to catch timeout
Running guests on default URI: b1, b2, b3
Shutting down guests on default URI...
Starting shutdown on guest: b1
Starting shutdown on guest: b2
Waiting for 3 guests to shut down, 60 seconds left
Shutdown of guest b1 complete.
Starting shutdown on guest: b3
Waiting for 2 guests to shut down, 55 seconds left
...
Waiting for 2 guests to shut down, 5 seconds left
Timeout expired while shutting down domains
Tested parallel=2 with guests going away normally
=> ok, one down kicked next shutdown
=> ok, loop continued and completed
Running guests on default URI: b1, b2, b3
Shutting down guests on default URI...
Starting shutdown on guest: b1
Starting shutdown on guest: b2
Waiting for 3 guests to shut down, 60 seconds left
Shutdown of guest b1 complete.
Shutdown of guest b2 complete.
Starting shutdown on guest: b3
Shutdown of ...