Comment 0 for bug 1621336

Revision history for this message
Martin Pitt (pitti) wrote :

I reproducibly run into an eternal hang when deploying services with Juju, when it prepares a new xenial testbed. The current xenial cloud image does not have the latest snapd, so snapd gets dist-upgraded:

Preparing to unpack .../snapd_2.14.2~16.04_amd64.deb ...
Warning: Stopping snapd.service, but it can still be activated by:
  snapd.socket
Unpacking snapd (2.14.2~16.04) over (2.13) ...
Setting up snapd (2.14.2~16.04) ...
[...] hangs

The postinst tries to start snapd.boot-ok.service on upgrade:

           |-cloud-init(311)-+-apt-get(577)---dpkg(845)---snapd.postinst(846)---perl(919)---systemctl(922)
           | `-sh(354)---tee(355)

root 922 0.0 0.0 25316 1412 pts/0 S+ 06:09 0:00 /bin/systemctl start snapd.boot-ok.service

This hangs eternally because:

 - cloud-init's dist-upgrade runs *during* the boot process, so that the system is not fully booted yet when this happens (see bug 1576692); thus multi-user.target is *not* yet active

 - snapd.boot-ok.service is After=multi-user.target

 - "systemctl start" is synchronous by default, i. e. it waits until the service is started unless you use --no-block.

Thus snapd.postinst waits on snapd.boot-ok.service waits on multi-user.target waits on cloud-init to finish waits on snapd.postinst to finish.

I think conceptually you shouldn't start snapd.boot-ok.service in the postinst; if the system is already booted (manual dist-upgrade) it should already be running, and if it does get upgraded during boot (with cloud-init) then you shouldn't pretend that booting is already finished. So I suggest to use dh_installinit with --no-scripts for snapd.boot-ok.service.