core snap with configure hook fails for some people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Snappy |
Fix Released
|
Critical
|
Unassigned |
Bug Description
We have failure reports from the core snap that fail on refresh with the following error message:
"""
download-snap: Undoing
validate-snap: Done
mount-snap: Undo
stop-snap-services: Undo
remove-aliases: Undo
unlink-
copy-snap-data: Undo
setup-profiles: Undo
link-snap: Undo
INFO Requested daemon restart.
set-auto-aliases: Undo
setup-aliases: Undoing
start-snap-
cleanup: Done
run-hook: Error
ERROR error: cannot find installed snap \"core\" at revision unset
"""
The reason for this error is unknown. It may be related to the snapd restart. Probably not related to the ubuntu-core->core transition. We have reports about failures on systems that do not have a transition counter in the state.
In snaps < 2.22.7 the error message from cmd/snap/cmd_run.go for hooks was very generic and just prefixed "error:" so we don't know but suspect its coming from the "snap run --hook configure -r unset core" call that snapd generates when running the configure hook.
If it is restart releated and we fix restart, we will still have the problem that the snapd with the fix needs to get installed first before we run the hook. So a system that starts with 16.04.2 and snapd-2.21 will have a certain chance of failing on refresh.
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in snappy: | |
status: | In Progress → Fix Committed |
Here is my current theory:
1. We have a bug in that the "snap" command fired from within hooks is not being re-executed, so we have a new snapd calling out into an old snap. This happens because when snapd is re-executed it resets the flag, so any nested calls that rely on the external PATH won't re-exec. For fixing this, I suggest even fiddling with the env variable, but rather finding out the right location for the snap bin within the current core snap and running it from there.
2. Very old snap commands (<= 2.11) did not handle the "unset" revision properly by reading out what the current revision is, instead relying on command line input alone:
https:/ /github. com/snapcore/ snapd/blob/ 2.11/cmd/ snap/cmd_ run.go# L93
The error message we see is from this line, when "unset" is at hand:
https:/ /github. com/snapcore/ snapd/blob/ 2.11/cmd/ snap/cmd_ run.go# L115
The real bug is (1), since the stable version has the fix for a very long time. If people re-exec into a snapd that handles that properly, (2) should be gone. If people don't have snapd setup to re-exec, then the only way to update snapd is via the system packaging anyway.