Snapd postinst script hangs

Bug #1868706 reported by Mark Shuttleworth on 2020-03-24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
dpkg (Ubuntu)
systemd (Ubuntu)

Bug Description

On a machine which has been tracking upgrades for a while, I am unable to install new versions of snapd. The install process stalls indefinitely. Processes running suggest this is a hang in the postinst:

~$ ps ax | grep dpkg
   2388 pts/1 Ss+ 0:00 /usr/bin/dpkg --status-fd 25 --configure --pending
   2389 pts/1 S+ 0:00 /bin/sh /var/lib/dpkg/info/snapd.postinst configure 2.43.3+git1.8109f8
   2875 pts/2 S+ 0:00 grep --color=auto dpkg

~$ pstree 2388

~$ ps ax | grep snapd
    732 ? Ssl 0:00 /usr/lib/snapd/snapd
   2389 pts/1 S+ 0:00 /bin/sh /var/lib/dpkg/info/snapd.postinst configure 2.43.3+git1.8109f8
   2452 pts/1 S+ 0:00 /bin/systemctl start snapd.autoimport.service snapd.core-fixup.service snapd.recovery-chooser-trigger.service snapd.seeded.service snapd.service snapd.snap-repair.timer snapd.socket snapd.system-shutdown.service
   2880 pts/2 S+ 0:00 grep --color=auto snapd

Changed in snapd:
importance: Undecided → Critical
status: New → Triaged
Paweł Stołowski (stolowski) wrote :

At the moment the main suspect is the systemd-tty-ask (which would block with a password prompt), although it might be a long shot. I've just tried with the latest focal desktop image, there were lots of debs to upgrade (including snapd) but I couldn't reproduce it. The postinst is relatively simple, it just starts all mount units of snaps with a systemctl wrapper.

Could you please:
- paste `pstree -u` output for the pid of systemctl process (the one under 2388's tree)
- paste the output of 'ls -l /proc/<systemctl pid>/fd/0'
- attach /var/log/auth.log

Thank you!

Paweł Stołowski (stolowski) wrote :

I'm tentatively adding systemd/dpkg.

Paweł Stołowski (stolowski) wrote :

Also, the list of installed snaps may be relevant in understanding what happens.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dpkg (Ubuntu):
status: New → Confirmed
Changed in systemd (Ubuntu):
status: New → Confirmed
Mark Shuttleworth (sabdfl) wrote :

OK, so *now* the situation has cleared on that machine. And there are no snaps installed :/

Balint Reczey (rbalint) on 2020-03-26
Changed in dpkg (Ubuntu):
status: Confirmed → Invalid
Balint Reczey (rbalint) wrote :

@sabdfl How was dpkg started? As root systemctl should not ask for password.

@stolowski The snapd version seems to be a git snapshot, not one from the archive.

Marking as invalid for dpkg and systemd because nothing shows that there would be any issue with their behaviour. Please reopen the bug against one of them if the root cause seem to be in them.

Changed in systemd (Ubuntu):
status: Confirmed → Invalid
Paweł Stołowski (stolowski) wrote :

@Mark does 'snap changes' (and then 'snap change <ID>' if you see any failed changes) reveal antyhing interesing? Some of the older focal images had broken seeds that would never succeed, potentially causing other issues.

Paweł Stołowski (stolowski) wrote :

Ok, reproduced with focal and broken seeds from an older image, symptoms are exactly the same (including ps/pstree output). The problem is caused by blocking on snapd.seeded.service (which waits for seeding to complete, but it's never going to complete with broken seeds).

Given that this is an edge case (we don't expect images with broken seeds) and focal images have been fixed already, I'm lowering the priority. We will discuss what to do to handle this gracefully.

Changed in snapd:
importance: Critical → High

Pawel, this is great debugging, thank you. Agree it's an edge case.


Silas S. Brown (ssb22) wrote :

Very similar thing happened on Lubuntu 20.04, `apt-get dist-upgrade` said

    Unpacking snapd (2.47.1+20.04) over (2.46.1+20.04) ...
    Setting up snapd (2.47.1+20.04) ...
    Installing new version of config file /etc/apparmor.d/usr.lib.snapd.snap-confine.real ...

and hung. `pstree` just showed `apt-get---dpkg---snapd.postinst` (no children). It ignored `SIGINT` and `SIGTERM`. `ps auxwww` shows

    root 4091 0.0 0.0 2608 1740 pts/1 S+ 09:42 0:00 /bin/sh /var/lib/dpkg/info/snapd.postinst configure 2.46.1+20.04

and `ls -l /proc/4091/fd` gives

    lrwx------ 1 root root 64 Nov 3 11:43 0 -> /dev/pts/1
    l-wx------ 1 root root 64 Nov 3 11:44 1 -> /dev/null
    lr-x------ 1 root root 64 Nov 3 11:44 10 -> /var/lib/dpkg/info/snapd.postinst
    lrwx------ 1 root root 64 Nov 3 11:44 11 -> /dev/pts/1
    lrwx------ 1 root root 64 Nov 3 11:44 2 -> /dev/pts/1

(timestamps being the current time of the `ls`, not the process start time which was about 2 hours earlier) I was connecting to the machine via `ssh` from elsewhere on our home network.

After `kill -9 4091` the parent `dpkg` process was still hung, `ps auxwww` gives:

    root 4090 0.0 0.2 18092 9932 pts/1 Ss+ 09:42 0:00 /usr/bin/dpkg --status-fd 55 --configure --pending

and `ls -l /proc/4090/fd` gives:

    lrwx------ 1 root root 64 Nov 3 11:49 0 -> /dev/pts/1
    lrwx------ 1 root root 64 Nov 3 11:49 1 -> /dev/pts/1
    lrwx------ 1 root root 64 Nov 3 11:49 2 -> /dev/pts/1
    lrwx------ 1 root root 64 Nov 3 11:49 3 -> /var/lib/dpkg/lock
    l-wx------ 1 root root 64 Nov 3 11:49 4 -> /var/lib/dpkg/updates/tmp.i
    lrwx------ 1 root root 64 Nov 3 11:49 5 -> /var/lib/dpkg/triggers/Lock
    l-wx------ 1 root root 64 Nov 3 11:49 55 -> 'pipe:[55607]'
    lr-x------ 1 root root 64 Nov 3 11:49 6 -> /var/lib/dpkg/triggers/Unincorp
    l-wx------ 1 root root 64 Nov 3 11:49 7 -> /var/log/dpkg.log
    lr-x------ 1 root root 64 Nov 3 11:49 8 -> /var/lib/dpkg/diversions

After `kill -9 4090` the original SSH session gets

    E: Sub-process /usr/bin/dpkg exited unexpectedly
    W: Operation was interrupted before it could finish

and dropped back to root prompt. I then tried `dpkg --configure -a` and it apparently went straight through:

    Setting up snapd (2.47.1+20.04) ...
    snapd.failure.service is a disabled or a static unit, not starting it.
    snapd.snap-repair.service is a disabled or a static unit, not starting it.

and then carried on setting up the other packages. So I'm logging the above in the hope that somebody can read more meaning into it than I can.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers