I reproducibly run into an eternal hang when deploying services with Juju, when it prepares a new xenial testbed. The current xenial cloud image does not have the latest snapd, so snapd gets dist-upgraded:
Preparing to unpack .../snapd_2.14.2~16.04_amd64.deb ...
Warning: Stopping snapd.service, but it can still be activated by:
snapd.socket
Unpacking snapd (2.14.2~16.04) over (2.13) ...
Setting up snapd (2.14.2~16.04) ...
[...] hangs
The postinst tries to start snapd.boot-ok.service on upgrade:
- cloud-init's dist-upgrade runs *during* the boot process, so that the system is not fully booted yet when this happens (see bug 1576692); thus multi-user.target is *not* yet active
- snapd.boot-ok.service is After=multi-user.target
- "systemctl start" is synchronous by default, i. e. it waits until the service is started unless you use --no-block.
Thus snapd.postinst waits on snapd.boot-ok.service waits on multi-user.target waits on cloud-init to finish waits on snapd.postinst to finish.
I think conceptually you shouldn't start snapd.boot-ok.service in the postinst; if the system is already booted (manual dist-upgrade) it should already be running, and if it does get upgraded during boot (with cloud-init) then you shouldn't pretend that booting is already finished. So I suggest to use dh_installinit with --no-scripts for snapd.boot-ok.service.
I reproducibly run into an eternal hang when deploying services with Juju, when it prepares a new xenial testbed. The current xenial cloud image does not have the latest snapd, so snapd gets dist-upgraded:
Preparing to unpack .../snapd_ 2.14.2~ 16.04_amd64. deb ...
Warning: Stopping snapd.service, but it can still be activated by:
snapd.socket
Unpacking snapd (2.14.2~16.04) over (2.13) ...
Setting up snapd (2.14.2~16.04) ...
[...] hangs
The postinst tries to start snapd.boot- ok.service on upgrade:
| `-sh(354)
root 922 0.0 0.0 25316 1412 pts/0 S+ 06:09 0:00 /bin/systemctl start snapd.boot- ok.service
This hangs eternally because:
- cloud-init's dist-upgrade runs *during* the boot process, so that the system is not fully booted yet when this happens (see bug 1576692); thus multi-user.target is *not* yet active
- snapd.boot- ok.service is After=multi- user.target
- "systemctl start" is synchronous by default, i. e. it waits until the service is started unless you use --no-block.
Thus snapd.postinst waits on snapd.boot- ok.service waits on multi-user.target waits on cloud-init to finish waits on snapd.postinst to finish.
I think conceptually you shouldn't start snapd.boot- ok.service in the postinst; if the system is already booted (manual dist-upgrade) it should already be running, and if it does get upgraded during boot (with cloud-init) then you shouldn't pretend that booting is already finished. So I suggest to use dh_installinit with --no-scripts for snapd.boot- ok.service.