Error during refresh of snapd leads to unusable system

Bug #1899794 reported by Jean-Baptiste Lallement
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Released
High
Unassigned
snapd (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

$ snap version
snap 2.47.1+20.10
snapd 2.47.1+20.10
series 16
ubuntu 20.10
kernel 5.8.0-22-generic

System fully up to date

Symptom: I cannot launch any snap. Running from a terminal shows for example:
$ spotify
cannot stat /var/lib/snapd/seccomp/bpf/snap.spotify.spotify.bin: No such file or directory

snap list returns the list of snaps

snap changes reports one error:
ID Status Spawn Ready Summary
65 Error today at 13:04 CEST today at 13:06 CEST Auto-refresh snaps "snapd", "squirrelsql"

Details for this changes report:
$ snap change 65
Status Spawn Ready Summary
Done today at 13:04 CEST today at 13:06 CEST Ensure prerequisites for "snapd" are available
Undone today at 13:04 CEST today at 13:06 CEST Download snap "snapd" (9607) from channel "latest/stable"
Done today at 13:04 CEST today at 13:06 CEST Fetch and check assertions for snap "snapd" (9607)
Undone today at 13:04 CEST today at 13:06 CEST Mount snap "snapd" (9607)
Undone today at 13:04 CEST today at 13:06 CEST Run pre-refresh hook of "snapd" snap if present
Undone today at 13:04 CEST today at 13:06 CEST Stop snap "snapd" services
Undone today at 13:04 CEST today at 13:04 CEST Remove aliases for snap "snapd"
Undone today at 13:04 CEST today at 13:04 CEST Make current revision for snap "snapd" unavailable
Undone today at 13:04 CEST today at 13:04 CEST Copy snap "snapd" data
Error today at 13:04 CEST today at 13:04 CEST Setup snap "snapd" (9607) security profiles
Hold [...]

Full log from that time attached to the bug report in journal.log

$ ls -ld /snap/snapd/*/usr/lib/snapd/snap-seccomp
-rwxr-xr-x 1 root root 2306928 sept. 4 18:33 /snap/snapd/9279/usr/lib/snapd/snap-seccomp
-rwxr-xr-x 1 root root 2306928 sept. 30 06:59 /snap/snapd/9607/usr/lib/snapd/snap-seccomp
-rwxr-xr-x 1 root root 2306928 sept. 30 06:59 /snap/snapd/current/usr/lib/snapd/snap-seccomp

There is no version 8790

$ sudo ls -l /proc/70906/exe
[...]
lrwxrwxrwx 1 root root 0 oct. 14 15:02 /proc/70906/exe -> /usr/lib/snapd/snapd

refreshing snapd manually "fixes" the issue.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
summary: - snapd error during refresh
+ snapd error during refresh leads to unusable system
summary: - snapd error during refresh leads to unusable system
+ Error during refresh of snapd leads to unusable system
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

As discussed on IRC with jibel, there was a package update happening around from 07:01:25 to 07:03:53 on Oct 12.

Looking at the logs, I suspect the following happened:

- a reexeced snapd was running
- snapd package update requested systemctl restart snapd.service
- the snapd service hit a kill timeout (unclear why, there is a log that indicates a graceful shutdown did not happen after 25s, then systemd proceeds to kill the process, which exits due to SIGTERM)
- snapd.failure.service kicks in, calls systemctl reset-failed on snapd.service
- snapd.failure starts 'previous' snapd with SNAPD_REVERT_TO_REV, but that has no meaning, because on the last run, there was no update of snapd happening
- package update calls systemctl start snapd

<at this point there are 2 snapd processes running, one under snap-failure, another started by postinst, state gets corrupted with both snapd processes modifying it>

- some time later (oct. 14 13:04:58) a refresh occurs in the process run from snap-failure, snapd exits, snapd.failure service finishes successfully
- snapd service is restarted and operates for a while
- reboot
- snapd service attempts to refresh the snapd snap, but is confused by the state not matching actual situation in the system

The problems we need to address is:
- make sure that is only one instance of snapd running
- investigate why snapd shutdown gets killed
- make sure that snapd started by snap-failure acts correctly when there was no snapd update

Changed in snapd:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

I believe a problem addressing snap-failure and snapd has been fixed a couple of releases back. Should the problem persist, please leave a comment.

Changed in snapd (Ubuntu):
status: New → Fix Released
Changed in snapd:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.