snapd.boot-ok.service hangs eternally on cloud image upgrades
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Unassigned |
Bug Description
==== Begin SRU Template [cloud-init] ====
[Impact]
One of cloud-init's features is to upgrade the system during first boot so that it is fully up to date when the user code starts running.
[Test Case]
launch an old instance of 16.04 that will need an update to snapd with
user-data that indicates a package upgrade should be done.
$ lxc image show ubuntu:74a491804877
autoupdate: false
properties:
aliases: 16.04,default,
architecture: amd64
description: ubuntu 16.04 LTS amd64 (release) (20160830)
label: release
os: ubuntu
release: xenial
serial: "20160830"
version: "16.04"
public: true
$ printf "#%s\n%s\n" cloud-config "packages: [snapd]" > user-data
$ lxc launch ubuntu:74a491804877 xrecreate "--config=
$ lxc exec xrecreate -- tail -f /var/log/
# you will see the output log hang at:
# Setting up snapd (2.14.2~16.04) ...
## Now get new container and patch in cloud-init
$ lxc launch ubuntu:74a491804877 xpatched
# let it boot, with no user-data saying to update.
$ sleep 10
# update the container to new cloud-init, then clean it to make
# it look like first boot again.
$ lxc file push - xpatched/
$ lxc exec xpatched -- sh -c '
p=/
echo deb http://
apt-get update -q && apt-get -qy install cloud-init'
$ lxc exec xpatched -- sh -c '
cd /var/lib/cloud && for d in *; do [ "$d" = "seed" ] || rm -Rf "$d"; done
rm -Rf /var/log/
$ lxc exec xpatched reboot
$ lxc exec xpatched -- tail -f /var/log/
# snapd installed and a 'Cloud-init finished' message.
[Regression Potential]
The change to running package installation later in boot will likely affect some things. However, previously a larger set of things were unreliable. This will make things over all more reliable.
==== End SRU Template [cloud-init] ====
I reproducibly run into an eternal hang when deploying services with Juju, when it prepares a new xenial testbed. The current xenial cloud image does not have the latest snapd, so snapd gets dist-upgraded:
Preparing to unpack .../snapd_
Warning: Stopping snapd.service, but it can still be activated by:
snapd.socket
Unpacking snapd (2.14.2~16.04) over (2.13) ...
Setting up snapd (2.14.2~16.04) ...
[...] hangs
The postinst tries to start snapd.boot-
| `-sh(354)
root 922 0.0 0.0 25316 1412 pts/0 S+ 06:09 0:00 /bin/systemctl start snapd.boot-
This hangs eternally because:
- cloud-init's dist-upgrade runs *during* the boot process, so that the system is not fully booted yet when this happens (see bug 1576692); thus multi-user.target is *not* yet active
- snapd.boot-
- "systemctl start" is synchronous by default, i. e. it waits until the service is started unless you use --no-block.
Thus snapd.postinst waits on snapd.boot-
I think conceptually you shouldn't start snapd.boot-
Related branches
- cloud-init Commiters: Pending requested
-
Diff: 50 lines (+9/-9)2 files modifiedconfig/cloud.cfg (+8/-8)
systemd/cloud-final.service (+1/-1)
tags: | added: oil |
Changed in cloud-init (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in snapd (Ubuntu Xenial): | |
status: | New → In Progress |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Xenial): | |
status: | New → In Progress |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Xenial): | |
status: | In Progress → Fix Committed |
description: | updated |
tags: |
added: verification-needed removed: verification-done |
Changed in snapd (Ubuntu): | |
status: | Triaged → In Progress |
Changed in snapd (Ubuntu Xenial): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in snapd (Ubuntu): | |
status: | Fix Committed → Confirmed |
Status changed to 'Confirmed' because the bug affects multiple users.