libvirt-guests.sh fails to shutdown guests in parallel

Bug #1688508 reported by Christoph Wolff
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
libvirt
Fix Released
Undecided
libvirt (Ubuntu)
Fix Released
Medium
Christian Ehrhardt 
Xenial
Fix Released
Medium
Jorge Niedbalski
Zesty
Won't Fix
Medium
Jorge Niedbalski
Artful
Fix Released
Medium
Jorge Niedbalski

Bug Description

[Environment]

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

[Impact]

There is a bug/race condition on libvirt-guests.service, that prevents the shutdown of guests to happen in parallel.

The critical chain for this service is:

libvirt-guests.service +20ms
└─libvirt-bin.service @2.784s +140ms
  └─remote-fs.target @2.777s
    └─remote-fs-pre.target @2.775s
      └─open-iscsi.service @2.554s +116ms
        └─iscsid.service @2.525s +18ms
          └─network-online.target @2.502s
            └─network.target @1.955s
              └─networking.service @1.625s +299ms
                └─network-pre.target @1.601s
                  └─cloud-init-local.service @405ms +1.072s
                    └─systemd-remount-fs.service @232ms +64ms
                      └─systemd-journald.socket @178ms
                        └─-.slice @117ms

As an example, I have the following kvm host with 42 virtual
machines.

ubuntu@xenial-base:~$ virsh list --all
 Id Name State
----------------------------------------------------
 12 locked-trusty-2 running
 13 locked-trusty-3 running
[...]
 41 locked-trusty-42 running

After rebooting the machine:

[ 250.999516] libvirt-guests.sh[4215]: Running guests on default URI: locked-trusty-2, locked-trusty-4, locked-trusty-12, locked-trusty-3, locked-trusty-5, locked-trusty-11, locked-trusty-10, locked-trusty-8, locked-trusty-9, locked-trusty-7, locked-trusty-6, locked-trusty-13, locked-trusty-14, locked-trusty-15, locked-trusty-16, locked-trusty-17, locked-trusty-18, locked-trusty-19, locked-trusty-20, locked-trusty-21, locked-trusty-22, locked-trusty-23, locked-trusty-24, locked-trusty-25, locked-trusty-26, locked-trusty-27, locked-trusty-28, locked-trusty-29, locked-trusty-30, locked-trusty-31, locked-trusty-32, locked-trusty-33, locked-trusty-34, locked-trusty-35, locked-trusty-36, locked-trusty-37, locked-trusty-38, locked-trusty-39, locked-trusty-40, locked-trusty-41, locked-trusty-42
[ 251.011367] libvirt-guests.sh[4215]: Shutting down guests on default URI...
[ 251.027072] libvirt-guests.sh[4215]: Starting shutdown on guest: locked-trusty-2
[...]
[ 391.949941] libvirt-guests.sh[4215]: Waiting for 28 guests to shut down, 10 seconds left
[ 398.074405] libvirt-guests.sh[4215]: Waiting for 28 guests to shut down, 5 seconds left
[ 403.020479] libvirt-guests.sh[4215]: Timeout expired while shutting down domains
[ OK ] Stopped Suspend Active Libvirt Guests.
[ OK ] Stopped target System Time Synchronized.

[Test Case]

 * Make sure the following variables are set in /etc/default/libvirt-guests (which are all default options):

ON_SHUTDOWN=shutdown
PARALLEL_SHUTDOWN=10
SHUTDOWN_TIMEOUT=120

 * Create over 20 virtual machines (in my case, using uvt-kvm).

$ for f in $(seq 0 40); do uvt-kvm create --memory 2000 --cpu 1 locked-trusty-$f release=xenial arch=amd64 ; done

 * Reboot the machine and monitor the systemd service stop sequence
or console output.

(With systemd: systemctl start debug-shell and jumpt to ctrl+alt+f9)

* Error message "Timeout expired while shutting down domains" should
be displayed.

[Regression Potential]

* None identified.

[Other Info]

* There is a proposed patch in upstream already that has been already
linked to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1450141

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Christoph and thanks for your report and thereby help to make Ubuntu better!

The default value for this is PARALLEL_SHUTDOWN=10 so everybody would run into this issue.
I assume that there needs to be more to this than just "broken in general", so let us try to find what it is that makes this fail for you.

These scripts weren't touched a long time as they just used to work so far.
I wondered that for me "check_guests_shutdown" is on a different line (353) then.

That might just be a type or such, but to be sure could you check with verify if the package thinks the file is non default (after you remove your modification of course):
dpkg --verify libvirt-bin
I checked the md5 of the file in the version you referred to which is:
$ md5sum /usr/lib/libvirt/libvirt-guests.sh
611e4b35894329192f0313c1c2c639aa /usr/lib/libvirt/libvirt-guests.sh

Never the less I found the issue you are describing:
The assignment is:
444: on_shutdown=$(check_guests_shutdown "$uri" "$on_shutdown")
The report of the translated message it like:
361: eval_gettext "Failed to determine state of guest: \$guest. Not tracking it anymore."

While certainly broken and needing a fix this should at least still time out for you after the default of 2 minutes right?
You could lessen the timeout as the most convenient until a proper fix is there then.

Also the issue only occurs if function guest_is_on fails (so neither detected run, nor not running, but really failing). Eventually that executes:
$ virsh domname <uuid>
That should also fail in your case to trigger the issue - is there any obvious reason you'd know why that fails for you? The output of this should also be mixed into the result in your case, so maybe you find it there.

But while it is interesting to understand why this is triggering for you it is an issue none-the-less

Changed in libvirt (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
tags: added: need-upstream-report
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The script itself is not special to Ubuntu but taken from upstream.

And while I can see a not too complex fix adapting it in a style like "guest_is_on" to return the wanted content in a variable to avoid polluting the result with any output it should be fixed there still.

As far as I was able to check it is still broken the latest master as of today.
Would you mind as being the one who found it to report the issue there and linking the bug or mailing list entry here to help tracking the discussion there?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'm not a big fan of the bash variable scoping in there, but sticking to the style used this might be a fix to the issue that was reported.

It would be great if you could try this diff on your file and see if it resolves your issues as well.
If it does please feel free to forward it as RFC on your upstream report.

tags: added: server-next
tags: added: patch
Revision history for this message
Christoph Wolff (yonk) wrote :
Download full text (5.2 KiB)

Hello Christian, thank you very much for your detailed response.

>The default value for this is PARALLEL_SHUTDOWN=10 so everybody would run into this issue.
>I assume that there needs to be more to this than just "broken in general", so let us try to find what it is that makes this fail for you.

These were exactly my thoughts when I encountered the bug.

>While certainly broken and needing a fix this should at least still time out for you after the >default of 2 minutes right?
>You could lessen the timeout as the most convenient until a proper fix is there then.

Actually no, that was my first guess too and I turned down the time-out. But what was actually happening was that since it failed to shut down the VMs, the check_guests_shutdown() got called repeatedly, thereby adding more error messages to the list of VMs to shut down and so on. So it actually never timed out because the list of VMs only grew longer.

>I wondered that for me "check_guests_shutdown" is on a different line (353) then.
>That might just be a type or such, but to be sure could you check with verify if the package thinks the file is non default (after>you remove your modification of course):

I'm pretty sure the file is default, it's propably an empty line somewhere from when I started debugging, but I will check on that right away. I have also already tried downloading the newest version from upstream, but you are right in that it remained pretty much the same (and that script also did not work).

>Also the issue only occurs if function guest_is_on fails (so neither detected run, nor not running, but really failing). Eventually that executes:
>$ virsh domname <uuid>
>That should also fail in your case to trigger the issue - is there any obvious reason you'd know why that fails for you? The output of this should also be mixed into the result in your case, so maybe you find it there.

Hmm, initally I thought that was just a very bad way of checking if the VM was still running, but upon closer inspection you are right. But when I manually run something like "virsh domname $uuid" it gives me the domname as output, so it seems to work fine. What might cause trouble here is that these VMs are 'transient', i.e. they do not keep their UUID after shutdown. That would explain why it can't check whether or not the VM has been shut down. I only know this because when I choose "suspend" as value for "ON_SHUTDOWN", it tells me that 'transient VMs can't be suspended".
Maybe I should mention that I run libvirt with opennebula, which basically puts a nice interface to KVM to manage VMs, so the transient thing comes from there.

>Would you mind as being the one who found it to report the issue there and linking the bug or mailing list entry here to help tracking the discussion there?

No problem, I will do that.

>It would be great if you could try this diff on your file and see if it resolves your issues as well.
Sadly the patch did not help, altough it changed the faulty behaviour, yay! Now I get repeated output looking like this:

sudo ./libvirt-guests.sh stop

Running guests on default URI: one-44, one-38
Shutting down guests on default URI...
Starting shutdown on guest: one-44
Star...

Read more...

Revision history for this message
Christoph Wolff (yonk) wrote :
Revision history for this message
Christoph Wolff (yonk) wrote :

Oops that was the wrong link, this one is correct https://bugzilla.redhat.com/show_bug.cgi?id=1450141

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for reporting as I asked, I added the tracker in the bug above so this bug will get a notification (to pick something up) once peter wrote an official fix.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

No movement on the upstream side yet that we could pick up - or the bug was not updated?
I'll stay subscribed but drop server-next now as we can't do anything in the short term.

tags: removed: server-next
Changed in libvirt:
importance: Unknown → Undecided
status: Unknown → In Progress
Changed in libvirt (Ubuntu):
status: Triaged → Confirmed
assignee: nobody → Jorge Niedbalski (niedbalski)
Changed in libvirt (Ubuntu Xenial):
status: New → In Progress
Changed in libvirt (Ubuntu Zesty):
status: New → In Progress
Changed in libvirt (Ubuntu Artful):
status: New → In Progress
Changed in libvirt (Ubuntu Xenial):
importance: Undecided → Medium
Changed in libvirt (Ubuntu Zesty):
importance: Undecided → Medium
Changed in libvirt (Ubuntu Artful):
importance: Undecided → Medium
Changed in libvirt (Ubuntu Xenial):
assignee: nobody → Jorge Niedbalski (niedbalski)
Changed in libvirt (Ubuntu Zesty):
assignee: nobody → Jorge Niedbalski (niedbalski)
Changed in libvirt (Ubuntu Artful):
assignee: nobody → Jorge Niedbalski (niedbalski)
description: updated
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello,

I've updated the bug description with the reproducer steps and details
on the bug itself.

I've also prepared a PPA for testing this on top of the current xenial-updates
archive (https://launchpad.net/~niedbalski/+archive/ubuntu/lp1688508).

I will proceed with the devel fixing and then start the SRU process for
all of the affected series.

description: updated
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

For reference, with @paelzer patch applied ,
the following is the libvirt-guests.sh stop sequence.

root@xenial-base:/usr/lib/libvirt# ./libvirt-guests.sh stop

Running guests on default URI: guest-11, guest-16, guest-9, guest-19, guest-20, guest-23, guest-15, guest-29, guest-7, guest-39, guest-36, guest-27, guest-40, guest-14, guest-10, guest-4, guest-3, guest-28, guest-26, guest-32, guest-2, guest-25, guest-6, guest-38, guest-1, guest-35, guest-31, guest-8, guest-30, guest-12, guest-21, guest-5, guest-18, guest-0, guest-17, guest-34, guest-13, guest-24, guest-33, guest-37, guest-22
Shutting down guests on default URI...
Starting shutdown on guest: guest-11
Starting shutdown on guest: guest-16
Starting shutdown on guest: guest-9
Starting shutdown on guest: guest-19
Starting shutdown on guest: guest-20
Starting shutdown on guest: guest-23
Starting shutdown on guest: guest-15
Starting shutdown on guest: guest-29
Starting shutdown on guest: guest-7
Starting shutdown on guest: guest-39
Waiting for 41 guests to shut down, 120 seconds left
Waiting for 20 guests to shut down, 115 seconds left
Waiting for 20 guests to shut down, 110 seconds left
Waiting for 20 guests to shut down, 105 seconds left
Waiting for 20 guests to shut down, 100 seconds left
Waiting for 20 guests to shut down, 95 seconds left
Waiting for 20 guests to shut down, 90 seconds left
Waiting for 20 guests to shut down, 85 seconds left
Waiting for 20 guests to shut down, 80 seconds left
Waiting for 20 guests to shut down, 75 seconds left
Waiting for 20 guests to shut down, 70 seconds left
Waiting for 20 guests to shut down, 65 seconds left
Waiting for 20 guests to shut down, 60 seconds left
Waiting for 20 guests to shut down, 55 seconds left
Waiting for 20 guests to shut down, 50 seconds left
Waiting for 20 guests to shut down, 45 seconds left
Waiting for 20 guests to shut down, 40 seconds left
Waiting for 20 guests to shut down, 35 seconds left
Waiting for 20 guests to shut down, 30 seconds left
Waiting for 20 guests to shut down, 25 seconds left
Waiting for 20 guests to shut down, 20 seconds left
Waiting for 20 guests to shut down, 15 seconds left

Waiting for 20 guests to shut down, 10 seconds left
Waiting for 20 guests to shut down, 5 seconds left
Timeout expired while shutting down domains

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Jorge, on this as I outlined before please get the change upstream.
Peter (in the linked Bugzilla) took the work but might be busy with other things.
Please help him to get something upstream and then think on the SRU for this.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Yep, prepared tested and uploaded to bionic-proposed now.

=> https://launchpad.net/ubuntu/+source/libvirt/4.0.0-1ubuntu4

Changed in libvirt (Ubuntu):
status: Confirmed → In Progress
assignee: Jorge Niedbalski (niedbalski) → ChristianEhrhardt (paelzer)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 4.0.0-1ubuntu4

---------------
libvirt (4.0.0-1ubuntu4) bionic; urgency=medium

  * d/p/ubuntu/lp1688508-tools-avoid-text-spilling-into-variables.patch:
    avoid hanging on shutdown (LP: #1688508)

 -- Christian Ehrhardt <email address hidden> Fri, 23 Feb 2018 16:43:19 +0100

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
Changed in libvirt:
status: In Progress → Fix Released
Changed in libvirt (Ubuntu Zesty):
status: In Progress → Won't Fix
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

SRU proposal for Artful.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

SRU proposal for Xenial.

Dan Streetman (ddstreet)
tags: added: sts-sponsor-ddstreet
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The change is in Bionic and Tested, I fixed up a minor issue in the Dep3 header that I made in Bionic.
We don't have to carry that to the SRUs.

Both checked and sponsored.
(I needed to update my git trees for the recent security updates anyway, so I could do both at once)

Please help to track this through SRU acceptance and proposed-migration.
Especially the bug verification as you know it is a bit more complex to set up in this case and you have a setup to do so already.

Dan Streetman (ddstreet)
tags: removed: sts-sponsor-ddstreet
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Christoph, or anyone else affected,

Accepted libvirt into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/1.3.1-1ubuntu10.20 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Changed in libvirt (Ubuntu Artful):
status: In Progress → Fix Committed
tags: added: verification-needed-artful
Revision history for this message
Chris J Arges (arges) wrote :

Hello Christoph, or anyone else affected,

Accepted libvirt into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/3.6.0-1ubuntu6.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Dariusz, would you mind as the reporter and Author to do the verification on this one?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Sorry Christoph you are the reporter of course, but @Dariusz I'd still appreciate if you would take over the verification since it is out for some time and there is no community verification yet.

Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

log from verification on Xenial.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Dariusz,
TL;DR we want this to be cancelled from SRU and need to rework upstream.
From there into Bionic (which would be released that way) and from there reconsidering SRU.

Resetting Task status and verification tags.

tags: added: 4.0.0-1ubuntu6 needs-upstreaming
tags: added: verification-failed verification-failed-artful verification-failed-xenial
removed: verification-needed verification-needed-artful verification-needed-xenial
Changed in libvirt (Ubuntu Artful):
status: Fix Committed → Triaged
Changed in libvirt (Ubuntu Xenial):
status: Fix Committed → Triaged
Changed in libvirt (Ubuntu):
status: Fix Released → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

TL;DR on the issue, the change upstream fixed the issue, but made some even worse issues visible.
It is not yet clear if the errors came in with the change itself if they were just hidden.

Needs repro, debug, fix, test, upstreaming, bionic, SRUs (in that order)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Repro:
$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 label=daily release=xenial
$ for f in $(seq 0 40); do uvt-kvm create --memory 2048 --cpu 4 xenial-testshutdown-$f release=xenial arch=amd64 ; done

Be aware of the up to 80G memory consumption (in theory, since they idle it is more like 20G).

Run
$ sudo systemctl stop libvirt-guests

The first 10 will be shut down fine (and more correctly than before the last fix).
But if you have MORE than the PARALLEL_SHUTDOWN= value then it fails to shut down those.

This seems to be an issue with updating some variables in the script.
At least the timeout properly kicks in and ends this (which makes it no worse than pre the last change where it could forget some guests).

FYI
More easily tested with 5 guests and the config set to
PARALLEL_SHUTDOWN=2
SHUTDOWN_TIMEOUT=20

Modify as needed, then run
sudo /usr/lib/libvirt/libvirt-guests.sh stop

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.2 KiB)

list_guests lists all 6
  8b5645c2-12e5-4ec7-9350-08adb3e8cb0f
  ef55638c-926a-4d43-9790-99aae3010951
  119975ea-8d86-4114-84db-02f1f655d279
  afe0321b-2deb-44c8-9cd3-727882b140b9
  6d0a05d2-b40c-433e-b656-3519626dcbf3
  a9224770-a4b7-4044-b4ce-2ceb934c1e4b

All xenial-testshutdown-* get printed to stdout

Listfile gets URI + guests:
+ echo default 8b5645c2-12e5-4ec7-9350-08adb3e8cb0f ef55638c-926a-4d43-9790-99aae3010951 119975ea-8d86-4114-84db-02f1f655d279 afe0321b-2deb-44c8-9cd3-727882b140b9 6d0a05d2-b40c-433e-b656-3519626dcbf3 a9224770-a4b7-4044-b4ce-2ceb934c1e4b

shutdown_guests_parallel called with:
+ uri=default
+ guests=8b5645c2-12e5-4ec7-9350-08adb3e8cb0f ef55638c-926a-4d43-9790-99aae3010951 119975ea-8d86-4114-84db-02f1f655d279 afe0321b-2deb-44c8-9cd3-727882b140b9 6d0a05d2-b40c-433e-b656-3519626dcbf3 a9224770-a4b7-4044-b4ce-2ceb934c1e4b

on_shutdown= (empty initially)
loops until
  $on_shutdown is empty (again)
  AND
  $guests is empty (expects remove of handled guests)

    Then counts current $on_shutdown vs $PARALLEL_SHUTDOWN target.
    Makes $guests the arguments vais "set --"
    Picks $1 as guest
    Shifts $1 out of the args
    guests is assigned the remaining guests.
    TL;DR This popped $1 into $guest and shrunk $guests by 1

    Checks if this popped val is already in $on_shutdown
    Gets a guest name
    Calls shutdown_guest_async with that $guest
=> 8b5645c2-12e5-4ec7-9350-08adb3e8cb0f
    Extends on_shutdown by $guest

    LOOPS UP

    $guests still full
    Then counts current $on_shutdown vs $PARALLEL_SHUTDOWN target - still some to go.
    Again pops one and shrinks $guests

    Checks if this popped val is already in $on_shutdown
    Gets a guest name
    Calls shutdown_guest_async with that $guest
=> ef55638c-926a-4d43-9790-99aae3010951
    Extends on_shutdown by $guest

  Now it has 2 (of 2) async shutdowns started

  sleep 1
  Counts remaining guests = 4
  Counts on shutdown = 2

  Reports on progress in regard to timeout

  # Saves current on_shutdown
  on_shutdown_prev=$on_shutdown

  # Determines how much of current on_shutdown are still running and sets $guests_shutting_down to remaining ones
  check_guests_shutdown "$uri" "$on_shutdown"

  on_shutdown="$guests_shutting_down"
  print_guests_shutdown "$uri" "$on_shutdown_prev" "$on_shutdown"

  Initially both shutdown guests are still on, so they are kept and nothing is reported as shut down.

    For now the inner loop repetition does not make progress due to on_shutdown already being 2 matching 2 of $PARALLEL_SHUTDOWN

  On the next check of check_guests_shutdown default 8b5645c2-12e5-4ec7-9350-08adb3e8cb0f ef55638c-926a-4d43-9790-99aae3010951 the guests are down

  on_shutdown= is set to an empty value
  on_shutdown_prev= 8b5645c2-12e5-4ec7-9350-08adb3e8cb0f ef55638c-926a-4d43-9790-99aae3010951 (from before the check)

  print_guests_shutdown reports the two guests as GONE
  LOOPS UP

  It detects that on shutdown is now count 0
  FIND#1: the check against $guests is not the long VAR
  Something modified $guests to the value of on_shutdown and this is what breaks progress.

  From here is is an infnite loop until timeout.

  This is a sco...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

After some debugging I found why it isn't correctly iterating anymore.

Note Local Branch: fix-lp1688508-guestscoping

Submitted to upstream as: https://www.redhat.com/archives/libvir-list/2018-March/msg01068.html

P.S. It wasn't even our fix that broke it, we just made it get further and then break on this under certain conditions (more guests than parallel) - it always had the chance to break that way.

Once accepted upstream we can add it in Bionic and from there reconsider a better SRU.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Follow on fix accepted upstream as [1].
next: need to finish a few more fixes to become one upload in Bionic.

[1]: https://libvirt.org/git/?p=libvirt.git;a=commit;h=77cd862fb5e1d61922ab945f52ad94f7753704a5

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The further patches I want to upstream need another round of polishing and re-submission until they can be accepted.
But this one only has accepted changes, so lets unblock this issue by pushing an interim version through regression tests and upload it once good.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 4.0.0-1ubuntu6

---------------
libvirt (4.0.0-1ubuntu6) bionic; urgency=medium

  * Backport from recent upstream to stabilize libvirt (LP: #1756915)
    - d/p/stable/0033-qemu-Fix-comparison-assignment-in-qemuDomainUpdateDe.patch
    - d/p/stable/0034-qemu-Fix-memory-leak-in-qemuConnectGetAllDomainStats.patch
    - d/p/stable/0035-libvirtd-fix-potential-deadlock-when-reloading.patch
    - d/p/stable/0036-qemu-Use-correct-bus-type-for-input-devices.patch
    - d/p/stable/0037-qemu-hostdev-Fix-the-error-on-VM-start-with-an-mdev-.patch
    - d/p/stable/0038-conf-Fix-crash-in-virDomainDefCompatibleDevice.patch
  * d/p/ubuntu/lp1688508-tools-fix-variable-scope-in-in-check_guests_shutdown:
    avoid issues shutting down more guests than configured for parallel
    shutdown (LP: #1688508)
  * d/p/ubuntu-aa/lp1756394-virt-aa-helper-resolve-file-symlinks.patch: fix
    using devices that are symlinks (LP: #1756394)

 -- Christian Ehrhardt <email address hidden> Mon, 19 Mar 2018 14:57:08 +0100

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

After identifying the former issue there are new uploads for the SRU Team to consider accepting in regard to this bug:
- Artful: libvirt_3.6.0-1ubuntu6.5_source.changes
- Xenial: libvirt_1.3.1-1ubuntu10.21_source.changes (This also includes a fix for 1753604)

Note: test builds in https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3224

Regression tests started on all arches

@Dariusz - feel free to test against the PPA instead of waiting for another full SRU in proposed.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Tested shutting down 40 guests with this through the service - worked fine this time.
If shutdown is really really fast (like small guests on a speedy host, you might see "Failed to determine state of guest ..." but that is fine. It only means it is gone fast, we don't have to wait for it and we can go on.
"Failed" is probably too hard of a word there, but this is as we have it upstream, so keep it as is.

Regression tests good as well, moving it to x-unapproved and a-unapproved for the SRU Team to consider.

Changed in libvirt (Ubuntu Artful):
status: Triaged → In Progress
Changed in libvirt (Ubuntu Xenial):
status: Triaged → In Progress
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

Christian, works perfectly fine here as well (tested from the ppa).
I notice the "Failed to determine..." messages, but nonetheless the guests are correctly shut down in a jiffie.

Thanks!

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Christoph, or anyone else affected,

Accepted libvirt into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/3.6.0-1ubuntu6.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Artful):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-artful
removed: verification-failed verification-failed-artful
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Christoph, or anyone else affected,

Accepted libvirt into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/1.3.1-1ubuntu10.21 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
removed: verification-failed-xenial
Revision history for this message
Dariusz Gadomski (dgadomski) wrote :

I've just finished verification for the following versions:
artful: 3.6.0-1ubuntu6.5
xenial: 1.3.1-1ubuntu10.21

No issues found in the scope of the test case (from the bug description).

tags: added: verification-done-artful verification-done-xenial
removed: verification-needed-artful verification-needed-xenial
tags: removed: verification-needed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: I've seen and resolved some autopkgtest regressions, but one will stay cockpit@arm.
For that I have opened https://code.launchpad.net/~paelzer/britney/hints-ubuntu-artful-cockpit/+merge/342915

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 3.6.0-1ubuntu6.5

---------------
libvirt (3.6.0-1ubuntu6.5) artful; urgency=medium

  * d/p/ubuntu/lp1688508-fix-variable-scope-in-in-check_guests_shutdown.patch:
    backport further upstream fixes that were identified on verification.
    Together with the former change this fixes (LP: #1688508)

libvirt (3.6.0-1ubuntu6.4) artful; urgency=medium

  * d/p/ubuntu/lp1688508-tools-avoid-text-spilling-into-variables.patch:
    avoid hanging on shutdown (LP: #1688508)

 -- Christian Ehrhardt <email address hidden> Tue, 03 Apr 2018 16:23:04 +0200

Changed in libvirt (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 1.3.1-1ubuntu10.21

---------------
libvirt (1.3.1-1ubuntu10.21) xenial; urgency=medium

  * d/p/ubuntu/lp1688508-fix-variable-scope-in-in-check_guests_shutdown.patch:
    backport further upstream fixes that were identified on verification.
    Together with the former change this fixes (LP: #1688508)
  * d/p/ubuntu/lp1753604-nwfilter-fix-lock-order-deadlock.patch:
    fix intermittent deadlock in NWFilter handling (LP: #1753604)

libvirt (1.3.1-1ubuntu10.20) xenial; urgency=medium

  * d/p/ubuntu/lp1688508-tools-avoid-text-spilling-into-variables.patch:
    avoid hanging on shutdown (LP: #1688508)

 -- Christian Ehrhardt <email address hidden> Wed, 04 Apr 2018 10:46:12 +0200

Changed in libvirt (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.