core snap with configure hook fails for some people

Bug #1668738 reported by Michael Vogt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Snappy
Fix Released
Critical
Unassigned

Bug Description

We have failure reports from the core snap that fail on refresh with the following error message:

"""
download-snap: Undoing
validate-snap: Done
mount-snap: Undo
stop-snap-services: Undo
remove-aliases: Undo
unlink-current-snap: Undo
copy-snap-data: Undo
setup-profiles: Undo
link-snap: Undo
 INFO Requested daemon restart.
set-auto-aliases: Undo
setup-aliases: Undoing
start-snap-services: Undone
cleanup: Done
run-hook: Error
 ERROR error: cannot find installed snap \"core\" at revision unset
"""

The reason for this error is unknown. It may be related to the snapd restart. Probably not related to the ubuntu-core->core transition. We have reports about failures on systems that do not have a transition counter in the state.

In snaps < 2.22.7 the error message from cmd/snap/cmd_run.go for hooks was very generic and just prefixed "error:" so we don't know but suspect its coming from the "snap run --hook configure -r unset core" call that snapd generates when running the configure hook.

If it is restart releated and we fix restart, we will still have the problem that the snapd with the fix needs to get installed first before we run the hook. So a system that starts with 16.04.2 and snapd-2.21 will have a certain chance of failing on refresh.

Michael Vogt (mvo)
description: updated
Michael Vogt (mvo)
description: updated
Michael Vogt (mvo)
description: updated
Michael Vogt (mvo)
description: updated
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Here is my current theory:

1. We have a bug in that the "snap" command fired from within hooks is not being re-executed, so we have a new snapd calling out into an old snap. This happens because when snapd is re-executed it resets the flag, so any nested calls that rely on the external PATH won't re-exec. For fixing this, I suggest even fiddling with the env variable, but rather finding out the right location for the snap bin within the current core snap and running it from there.

2. Very old snap commands (<= 2.11) did not handle the "unset" revision properly by reading out what the current revision is, instead relying on command line input alone:

    https://github.com/snapcore/snapd/blob/2.11/cmd/snap/cmd_run.go#L93

The error message we see is from this line, when "unset" is at hand:

    https://github.com/snapcore/snapd/blob/2.11/cmd/snap/cmd_run.go#L115

The real bug is (1), since the stable version has the fix for a very long time. If people re-exec into a snapd that handles that properly, (2) should be gone. If people don't have snapd setup to re-exec, then the only way to update snapd is via the system packaging anyway.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Complementing (1), we should only lookup the "snap" binary path within the core snap if snapd itself (the one running and making the decision) comes from there too. Otherwise we may end up having the reverse discrepancy: an external snapd calling into an internal snap.

Revision history for this message
Michael Vogt (mvo) wrote :
Changed in snappy:
importance: Undecided → Critical
status: New → In Progress
Michael Vogt (mvo)
Changed in snappy:
status: In Progress → Fix Committed
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I'm marking this as fix released since we landed a number of fixes into this area (and then some more) and we haven't seen anything resembling this issue for a while.

Changed in snappy:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.