Error during refresh of snapd leads to unusable system

Bug #1899794 reported by Jean-Baptiste Lallement
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
High
Unassigned
snapd (Ubuntu)
Undecided
Unassigned

Bug Description

$ snap version
snap 2.47.1+20.10
snapd 2.47.1+20.10
series 16
ubuntu 20.10
kernel 5.8.0-22-generic

System fully up to date

Symptom: I cannot launch any snap. Running from a terminal shows for example:
$ spotify
cannot stat /var/lib/snapd/seccomp/bpf/snap.spotify.spotify.bin: No such file or directory

snap list returns the list of snaps

snap changes reports one error:
ID Status Spawn Ready Summary
65 Error today at 13:04 CEST today at 13:06 CEST Auto-refresh snaps "snapd", "squirrelsql"

Details for this changes report:
$ snap change 65
Status Spawn Ready Summary
Done today at 13:04 CEST today at 13:06 CEST Ensure prerequisites for "snapd" are available
Undone today at 13:04 CEST today at 13:06 CEST Download snap "snapd" (9607) from channel "latest/stable"
Done today at 13:04 CEST today at 13:06 CEST Fetch and check assertions for snap "snapd" (9607)
Undone today at 13:04 CEST today at 13:06 CEST Mount snap "snapd" (9607)
Undone today at 13:04 CEST today at 13:06 CEST Run pre-refresh hook of "snapd" snap if present
Undone today at 13:04 CEST today at 13:06 CEST Stop snap "snapd" services
Undone today at 13:04 CEST today at 13:04 CEST Remove aliases for snap "snapd"
Undone today at 13:04 CEST today at 13:04 CEST Make current revision for snap "snapd" unavailable
Undone today at 13:04 CEST today at 13:04 CEST Copy snap "snapd" data
Error today at 13:04 CEST today at 13:04 CEST Setup snap "snapd" (9607) security profiles
Hold [...]

Full log from that time attached to the bug report in journal.log

$ ls -ld /snap/snapd/*/usr/lib/snapd/snap-seccomp
-rwxr-xr-x 1 root root 2306928 sept. 4 18:33 /snap/snapd/9279/usr/lib/snapd/snap-seccomp
-rwxr-xr-x 1 root root 2306928 sept. 30 06:59 /snap/snapd/9607/usr/lib/snapd/snap-seccomp
-rwxr-xr-x 1 root root 2306928 sept. 30 06:59 /snap/snapd/current/usr/lib/snapd/snap-seccomp

There is no version 8790

$ sudo ls -l /proc/70906/exe
[...]
lrwxrwxrwx 1 root root 0 oct. 14 15:02 /proc/70906/exe -> /usr/lib/snapd/snapd

refreshing snapd manually "fixes" the issue.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
summary: - snapd error during refresh
+ snapd error during refresh leads to unusable system
summary: - snapd error during refresh leads to unusable system
+ Error during refresh of snapd leads to unusable system
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

As discussed on IRC with jibel, there was a package update happening around from 07:01:25 to 07:03:53 on Oct 12.

Looking at the logs, I suspect the following happened:

- a reexeced snapd was running
- snapd package update requested systemctl restart snapd.service
- the snapd service hit a kill timeout (unclear why, there is a log that indicates a graceful shutdown did not happen after 25s, then systemd proceeds to kill the process, which exits due to SIGTERM)
- snapd.failure.service kicks in, calls systemctl reset-failed on snapd.service
- snapd.failure starts 'previous' snapd with SNAPD_REVERT_TO_REV, but that has no meaning, because on the last run, there was no update of snapd happening
- package update calls systemctl start snapd

<at this point there are 2 snapd processes running, one under snap-failure, another started by postinst, state gets corrupted with both snapd processes modifying it>

- some time later (oct. 14 13:04:58) a refresh occurs in the process run from snap-failure, snapd exits, snapd.failure service finishes successfully
- snapd service is restarted and operates for a while
- reboot
- snapd service attempts to refresh the snapd snap, but is confused by the state not matching actual situation in the system

The problems we need to address is:
- make sure that is only one instance of snapd running
- investigate why snapd shutdown gets killed
- make sure that snapd started by snap-failure acts correctly when there was no snapd update

Changed in snapd:
importance: Undecided → High
status: New → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers