libvirt-bin: during shutdown libvirt-bin is stopped before libvirt-guests causing hang

Bug #1829823 reported by Matthew Ruffell on 2019-05-21
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Undecided
Unassigned
Mitaka
High
Matthew Ruffell
libvirt (Ubuntu)
Medium
Unassigned
Xenial
Wishlist
Christian Ehrhardt 

Bug Description

[Impact]

 * libvirt-bin, in: libvirt-1.3.1-1ubuntu10.24~cloud0 in trusty mitaka uca
   and the parent package in xenial, libvirt-1.3.1-1ubuntu10.24 are
   affected.

 * When you shutdown a system in trusty which is running some kvm virtual
   machines, the libvirt-bin service is stopped before libvirt-guests.
   libvirt-guests tries to connect to the libvirt socket to send shutdown
   commands to the running vms, which cannot happen since libvirtd is not
   running.

 * On some machines, the qemu processes behind the virtual machines are
   not killed and are left behind as defunct processes, which can cause
   the system to hang on them not being terminated.

 * The bug is caused by the libvirt-bin upstart script [1] calling a
   non-existant script, /usr/lib/libvirt/libvirt-stop-guests [2]. This
   script used to exist in the upstart script itself in version
   1.2.2-0ubuntu13.1.27 [3], the version in the trusty archives. In
   liberty UCA, version 1.2.16-2ubuntu11.15.10.4~cloud0 [4], the script
   was separated out into /usr/lib/libvirt/libvirt-stop-guests [2].
   In the mitaka release, the libvirt-stop-guests script was removed and
   rewritten as /etc/init.d/libvirt-guests [5], but the script in [1] was
   never updated to point to it.

   [1] http://paste.ubuntu.com/p/GxxBczkCmk
   [2] http://paste.ubuntu.com/p/fKCDQh46vh
   [3] http://paste.ubuntu.com/p/QrKXqK2Bvz
   [4] http://paste.ubuntu.com/p/W8DgQwpYv3
   [5] http://paste.ubuntu.com/p/Z28Sp2fPd6

 * Since the upstart script was never updated to point to it, libvirt-bin
   stops without stopping libvirt-guests first. When libvirt-guests is
   stopped later, it cannot access the libvirt socket, cannot shut down
   the machines, causing the bug.

 * The fix is to change the upstart script to point to the new libvirt-
   guests script.

[Test Case]

 * You can reproduce this in trusty with the mitaka UCA enabled.

 1) Enable mitaka UCA and install libvirt0 and libvirt-bin

 $ sudo add-apt-repository cloud-archive:mitaka
 $ sudo apt update
 $ sudo apt install libvirt0 libvirt-bin

 2) Install a virtual machine, either by using virt-install or
    virt-manager.
    I used a bionic VM.

 3) Enable debugging on libvirt-guests so you can see what is going on

 Modify /etc/init.d/libvirt-guests and add "-x" to the end of "!/bin/sh"

 4) With the vm running, shut down the system

 $ sudo shutdown -h now

 5) Check /var/log/upstart/libvirt-bin.log, on reboot. It will say
 "No such file or directory: /usr/lib/libvirt/libvirt-stop-guests"

 6) During that shutdown, you will see messages like:
 error: failed to connect to the hypervisor
 error: no valid connection
 error: Failed to connect socket to '/var/run/libvirt/libvirt-sock':
 No such file or directory

 What should happen:

 If you follow the same steps with the fixed package, when you look at
 /var/log/upstart/libvirt-bin.log, you will see output of libvirt-guests
 connecting to and shutting down the virtual machines which looks a little
 like this: https://paste.ubuntu.com/p/s4jyJX2y9F/

[Regression Potential]

 * There is only one file modified, the upstart script for libvirt-bin.
   Currently this upstart file references a file which doesn't exist, so
   fixing it will restore the behavior in a way which aligns with exactly
   what took place in previous versions.

 * In xenial, all of this isn't used at all - see below at "Other Info"

 * This change only effects systems during shutdown while they still
   have virtual machines running, and do not effect starting and stopping
   services while the machine is running normally.

 * I believe the regression potential is low.

[Other Info]

 * Xenial is not effected by this bug even though it ships the exact same
   packages. This is because xenial uses insserv to generate service
   dependency files ".depend.boot" ".depend.start" ".depend.stop" which
   parse the scripts in /etc/init.d/ and systemd respects the dependency
   ordering in these files.
   libvirt-guests reports a dependency on libvirt-bin in the script
   header, so systemd will always stop libvirt-guests before libvirt-bin,
   avoiding the problem seen in trusty.

 * The fix is needed in trusty mitaka UCA and xenial will likely need the
   SRU as part of the process.

 * We'd never have uploaded that change alone for xenial (being a no-op
   causing MBs to download and an upgrade. But we will bundle it with an
   actual change - so it can "ride along" to eventually help
   Trusty-Mitaka. Unfortunately there is no "current" Xenial SRU for
   libvirt, hence we want to get it into xenial-proposed (which is enough
   for the UCA tooling) but we do not want to release it to xenial-updates
   until another another SRU comes by which we will generate with a -v
   covering both then.

 * That way also if e.g. a security fix comes by it will be based on what
   is in proposed.

Changed in libvirt (Ubuntu):
assignee: nobody → Matthew Ruffell (mruffell)
Changed in libvirt (Ubuntu Xenial):
assignee: nobody → Matthew Ruffell (mruffell)
Changed in cloud-archive:
assignee: nobody → Matthew Ruffell (mruffell)
description: updated
Changed in libvirt (Ubuntu):
importance: Undecided → Medium
Changed in libvirt (Ubuntu Xenial):
importance: Undecided → Medium
Changed in libvirt (Ubuntu):
status: New → In Progress
Changed in libvirt (Ubuntu Xenial):
status: New → In Progress
Changed in cloud-archive:
status: New → In Progress
description: updated
description: updated

Please do -NOT- make a Xenial SRU for that.
As we discussed on IRC and you pointed out in the report yourself - it is not affected due to systemd doing it right.

This should only be an Upstart related change that goes on top of the Cloud-Archive-Mitaka Delta.

Changed in libvirt (Ubuntu):
status: In Progress → Invalid
Changed in libvirt (Ubuntu Xenial):
status: In Progress → Invalid
Changed in libvirt (Ubuntu):
assignee: Matthew Ruffell (mruffell) → nobody
Changed in libvirt (Ubuntu Xenial):
assignee: Matthew Ruffell (mruffell) → nobody
Corey Bryant (corey.bryant) wrote :

Hi Matthew,

The Mitaka cloud archive reached end of life last month [1]. If you have a debdiff that fixes this then feel free to attach it to the bug and we may pick it up. I can't promise much though since we have other releases to focus on supporting. You might consider contacting canonical for ESM support if you are unable to upgrade from Trusty anytime soon.

[1] https://www.ubuntu.com/about/release-cycle

Thanks,
Corey

Matthew Ruffell (mruffell) wrote :

Hi Corey,

I have attached the debdiff for trusty-mitaka. I am from SEG, and have sent a test package to the customer for validation, so it can be included in Trusty ESM.

I'll let you know when the patch is verified.

Thanks,
Matthew.

Matthew Ruffell (mruffell) wrote :

Attached is the debdiff for xenial, in the case the SRU to xenial happens to keep libvirt in sync between xenial and trusty-mitaka.

I have done some additional testing in xenial, and I can confirm that the upstart script is not even called, and the sysvinit script is used instead. This means that there should be minimal regression risk, and it is more of a nop for xenial.

If you are especially interested in xenial, I have prepared a test package in a ppa here:
https://launchpad.net/~mruffell/+archive/ubuntu/lp1829823-libvirt-test

Hi Matthew,
I have discussed this with Corey on Friday.

My general preferred solution would still be to not do a Xenial upload at all. But after the talk with Corey he explained that for T-M there isn't really a good other way.
Just as you I have double checked and agree that this should not have any effect on Xenial, but there are two things to consider:
- people might still have tweaked their xenial to run with upstart (but for those it will be a fix)
- for the majority this will be a download of ~25MB for nor real change which is a waste of
  resources that we should not do.

Corey and I have agreed to handle it the following way:
- I'm willing to help you to push this to Xenial-Proposed
  - ME: re-Review the current solution
  - ME: sponsor it to Xenial unapprove
- YOU: If the SRU team objects you should do the discussion/convincing
- We will LEAVE it hanging in proposed and intentionally NOT release to Xenial it as-is
  - that means only testers of proposed will pick it up
  - When there is a libvirt fix later any normal SRU or security push will be based
    on what is in proposed
  - Only then it will be really available to Xenial users
- We didn't get to the point if the Cloud-Archive could sync from -proposed in this case
  - for now I expect the answer is no, which means you have no ETA when this will
    really be released

@Mathew - please confirm that this approach will work for you, then I'll do my steps
@Corey - please confirm here that this approach will work for you, also add if UCA-T-M could sync from proposed once it is verified there.

You are lucky, I have another libvirt Xenial SRU in the queue now.
I'll bundle this change.

Changed in libvirt (Ubuntu Xenial):
status: Invalid → Fix Released
description: updated
Changed in libvirt (Ubuntu Xenial):
status: Fix Released → Triaged
assignee: nobody → Christian Ehrhardt  (paelzer)
Changed in cloud-archive:
assignee: Matthew Ruffell (mruffell) → nobody
status: In Progress → Invalid
Corey Bryant (corey.bryant) wrote :

@Christian, looks like we can treat this as business as usual and wait on xenial-updates promotion prior to mitaka-updates promotion per comment #6. Thanks for the help!

Uploaded to Xenial unapproved as part of an other SRU that was required (to avoid extra builds/downloads)

Actually I need to revert that for a bit, the bundled fix only works on >=B so I need to rework it.

Sorry, "the other bug" resolved by actually not being valid for Xenial.
Due to that this bug here is again back at "maybe some day" - sorry for the false hope.

Changed in libvirt (Ubuntu Xenial):
importance: Medium → Wishlist

As discussed uploaded to xenial-unapproved under the condition that we will keep it in -proposed until another SRU (or security fix) will join it and be released together then. Per Corey this will be enough for the UCA tools to pick it up.

description: updated
Corey Bryant (corey.bryant) wrote :

@Christian, Thanks. Yes once this is accepted into xenial-proposed we can begin the backport to mitaka-proposed and make an exception in this case, once tested, to promote to mitaka-updates prior to xenial-updates promotion.

Note: This does leave a slight chance of regression on upgrade from mitaka-updates (trust) to xenial-updates if for some reason upstart is used on xenial.

> Note: This does leave a slight chance of regression on upgrade from
> mitaka-updates (trust) to xenial-updates if for some reason upstart is
> used on xenial.

Not really, as an upgrader to Xenial will use systemd which already
does the ordering right by considering the sysV headers and building a
chain based on that.
Due to that I think this isn't even a small window for an upgrade regression.

Robie Basak (racb) wrote :

I need to raise this with the SRU team generally I think.

Recently we clarified in https://wiki.ubuntu.com/StableReleaseUpdates#When_not_to_upload that we don't want to send users updates when there are no users affected at runtime. I understand that this causes difficulty for the Canonical Cloud Archive, but I'm not sure that's a justification - you can upload to that directly without the SRU. This is what I'd like to consult the others on.

We may permit the proposed pocket to be used as a place to stage updates until they can be bundled with user-affecting fixes, but we haven't worked out the details of that yet.

Robie Basak (racb) wrote :

Am I correct in my understanding that it would be fine for the cloud archive tooling if we landed the upstart change in proposed but it does not go to updates, pending a future SRU or security update?

On Thu, Jul 18, 2019 at 12:30 PM Robie Basak <email address hidden> wrote:
>
> Am I correct in my understanding that it would be fine for the cloud
> archive tooling if we landed the upstart change in proposed but it does
> not go to updates, pending a future SRU or security update?

Yes coreycb confirmed that (I had the same question) somewhere above already

Robie Basak (racb) wrote :

Thanks. Following discussion with SRU team, we concluded to use this bug as a trial for staging changes in proposed, if you're happy with that.

The idea is that we'll accept the upload into proposed but without the intention of it ever landing in updates. Future uploads should be based on the package in proposed, thus bundling this update and using proposed as the staging area.

We will tag "block-proposed" and see how tooling handles that and what changes may be needed.

We still still require a commitment for timely regular SRU verification to be performed on the upload please, for all bugs affected (just this one in this case I think?). This is to prevent a future uploader of a different bug being blocked on SRU verification of an unrelated staged upload.

If timely SRU verification is not performed, I expect the upload will end up being deleted from proposed, and/or a future upload being accepted that is not based on it, and that would of course defeat the point.

Corey and Christian: would this be acceptable to you both?

> Corey and Christian: would this be acceptable to you both?

For me - absolutely and I'm happy that we end up with sort of a
defined process for things like that.

Corey Bryant (corey.bryant) wrote :

@Robie, thanks for helping define a process for this situation. It is acceptable for me.

Robie Basak (racb) wrote :

Christian's upload got trumped by a security update. I have rebased it for you.

tags: added: block-proposed

Hello Matthew, or anyone else affected,

Accepted libvirt into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/1.3.1-1ubuntu10.28 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libvirt (Ubuntu Xenial):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-xenial
Matthew Ruffell (mruffell) wrote :

On xenial, I installed libvirt-bin 1.3.1-1ubuntu10.27 from -updates,
as well as virt-manager and qemu-system. I installed a bionic desktop VM and
issued "sudo shutdown -h now". libvirt-guests then connected to the VM and shut
it down gracefully before the system was shut down.

I then upgraded libvirt-bin to 1.3.1-1ubuntu10.28 from -proposed, started the
bionic VM and issued a system shutdown. Again, libvirt-guests then connected to
the VM and shut it down gracefully before the system was shut down.

The package is still a nop on xenial, and does not cause any regressions.

I looked at the debdiff for 1.3.1-1ubuntu10.28 and applied the changes to a
trusty system, where I had previously reproduced the problem. The modifications
from the debdiff made trusty libvirt-bin respect libvirt-guests, and the VM
that was running was shutdown gracefully before the system was shut down.

I am happy to mark this bug as verified, as I am happy that xenial will not
suffer any regressions, and that once the package makes its way into the
trusty mitaka cloud archive, the issue will be fixed.

tags: added: verification-done-xenial
removed: verification-needed verification-needed-xenial
Hari Kumar (harichkumar) on 2019-09-04
Changed in libvirt (Ubuntu Xenial):
status: Fix Committed → Fix Released
Changed in libvirt (Ubuntu Xenial):
status: Fix Released → Fix Committed
tags: added: block-proposed-bionic
removed: block-proposed
Robie Basak (racb) wrote :

I think you meant Xenial, not Bionic :)

tags: added: block-proposed-xenial
removed: block-proposed-bionic
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers