refresh of snapd failed

Bug #1867616 reported by Zygmunt Krynicki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Released
Undecided
Maciej Borzecki

Bug Description

$USERNAME@$HOSTNAME:~$ sudo snap refresh
2020-03-16T12:50:29+01:00 INFO Waiting for restart...
2020-03-16T12:50:33+01:00 ERROR there was a snapd rollback across the restart
error: cannot perform the following tasks:
- Automatically connect eligible plugs and slots of snap "snapd" (there was a snapd rollback across the restart)

Relevant journal fragment:

-- Reboot --
Mar 09 18:20:04 $HOSTNAME systemd[1]: Starting Snappy daemon...
Mar 09 18:20:10 $HOSTNAME snapd[680]: AppArmor status: apparmor is enabled but some kernel features are missing: dbus, network
Mar 09 18:20:11 $HOSTNAME snapd[680]: daemon.go:343: started snapd/2.43.2+git874.g156c191 (series 16; devmode) ubuntu-core/18 (arm64) linux/4.19.59-rt24-icx20.
Mar 09 18:20:12 $HOSTNAME snapd[680]: daemon.go:436: adjusting startup timeout by 1m15s (pessimistic estimate of 30s plus 5s per snap)
Mar 09 18:20:14 $HOSTNAME systemd[1]: Started Snappy daemon.
Mar 09 18:21:08 $HOSTNAME snapd[680]: storehelpers.go:438: cannot refresh: snap has no updates available: "core", "core18"
Mar 16 12:50:02 $HOSTNAME systemd[1]: snapd.service: Current command vanished from the unit file, execution of the command list won't be resumed.
Mar 16 12:50:04 $HOSTNAME snapd[680]: daemon.go:537: gracefully waiting for running hooks
Mar 16 12:50:04 $HOSTNAME snapd[680]: daemon.go:539: done waiting for running hooks
Mar 16 12:50:29 $HOSTNAME snapd[680]: cannot run daemon: context deadline exceeded
Mar 16 12:50:29 $HOSTNAME systemd[1]: snapd.service: Main process exited, code=exited, status=1/FAILURE
Mar 16 12:50:29 $HOSTNAME systemd[1]: snapd.service: Failed with result 'exit-code'.
Mar 16 12:50:29 $HOSTNAME systemd[1]: snapd.service: Triggering OnFailure= dependencies.
Mar 16 12:50:30 $HOSTNAME systemd[1]: snapd.service: Service hold-off time over, scheduling restart.
Mar 16 12:50:30 $HOSTNAME systemd[1]: snapd.service: Scheduled restart job, restart counter is at 1.

$USERNAME@$HOSTNAME:~$ snap change 207
Status Spawn Ready Summary
Done 7 days ago, at 18:21 CET today at 12:51 CET Ensure prerequisites for "snapd" are available
Undone 7 days ago, at 18:21 CET today at 12:51 CET Download snap "snapd" (6732) from channel "latest/edge"
Done 7 days ago, at 18:21 CET today at 12:51 CET Fetch and check assertions for snap "snapd" (6732)
Undone 7 days ago, at 18:21 CET today at 12:51 CET Mount snap "snapd" (6732)
Undone 7 days ago, at 18:21 CET today at 12:51 CET Run pre-refresh hook of "snapd" snap if present
Undone 7 days ago, at 18:21 CET today at 12:51 CET Stop snap "snapd" services
Undone 7 days ago, at 18:21 CET today at 12:50 CET Remove aliases for snap "snapd"
Undone 7 days ago, at 18:21 CET today at 12:50 CET Make current revision for snap "snapd" unavailable
Undone 7 days ago, at 18:21 CET today at 12:50 CET Copy snap "snapd" data
Undone 7 days ago, at 18:21 CET today at 12:50 CET Setup snap "snapd" (6732) security profiles
Undone 7 days ago, at 18:21 CET today at 12:50 CET Make snap "snapd" (6732) available to the system
Error 7 days ago, at 18:21 CET today at 12:50 CET Automatically connect eligible plugs and slots of snap "snapd"
Hold 7 days ago, at 18:21 CET today at 12:50 CET Set automatic aliases for snap "snapd"
Hold 7 days ago, at 18:21 CET today at 12:50 CET Setup snap "snapd" aliases
Hold 7 days ago, at 18:21 CET today at 12:50 CET Run post-refresh hook of "snapd" snap if present
Hold 7 days ago, at 18:21 CET today at 12:50 CET Start snap "snapd" (6732) services
Hold 7 days ago, at 18:21 CET today at 12:50 CET Remove data for snap "snapd" (5648)
Hold 7 days ago, at 18:21 CET today at 12:50 CET Remove snap "snapd" (5648) from the system
Hold 7 days ago, at 18:21 CET today at 12:50 CET Clean up "snapd" (6732) install
Hold 7 days ago, at 18:21 CET today at 12:50 CET Run health check of "snapd" snap
Done 7 days ago, at 18:21 CET today at 12:51 CET Consider re-refresh of "snapd"

......................................................................
Make current revision for snap "snapd" unavailable

2020-03-16T12:50:53+01:00 INFO Requested daemon restart (snapd snap).

......................................................................
Make snap "snapd" (6732) available to the system

2020-03-16T12:50:04+01:00 INFO Requested daemon restart (snapd snap).

......................................................................
Automatically connect eligible plugs and slots of snap "snapd"

2020-03-16T12:50:27+01:00 INFO Waiting for restart...
2020-03-16T12:50:28+01:00 INFO Waiting for restart...
2020-03-16T12:50:28+01:00 INFO Waiting for restart...
2020-03-16T12:50:28+01:00 INFO Waiting for restart...
2020-03-16T12:50:28+01:00 INFO Waiting for restart...
2020-03-16T12:50:29+01:00 INFO Waiting for restart...
2020-03-16T12:50:29+01:00 INFO Waiting for restart...
2020-03-16T12:50:29+01:00 INFO Waiting for restart...
2020-03-16T12:50:29+01:00 INFO Waiting for restart...
2020-03-16T12:50:33+01:00 ERROR there was a snapd rollback across the restart

Zygmunt Krynicki (zyga)
description: updated
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This happened again after new snapd came out:

$ sudo snap refresh
Consider re-refresh of "snapd" /
2020-03-16T15:38:06+01:00 INFO Waiting for restart...
error: cannot perform the following tasks:
- Automatically connect eligible plugs and slots of snap "snapd" (there was a snapd rollback across the restart)

Revision history for this message
Zygmunt Krynicki (zyga) wrote :
Download full text (5.4 KiB)

$ snap change 233 # the one that failed
Status Spawn Ready Summary
Done today at 15:36 CET today at 15:38 CET Ensure prerequisites for "snapd" are available
Undone today at 15:36 CET today at 15:38 CET Download snap "snapd" (6775) from channel "latest/edge"
Done today at 15:36 CET today at 15:38 CET Fetch and check assertions for snap "snapd" (6775)
Undone today at 15:36 CET today at 15:38 CET Mount snap "snapd" (6775)
Undone today at 15:36 CET today at 15:38 CET Run pre-refresh hook of "snapd" snap if present
Undone today at 15:36 CET today at 15:38 CET Stop snap "snapd" services
Undone today at 15:36 CET today at 15:38 CET Remove aliases for snap "snapd"
Undone today at 15:36 CET today at 15:38 CET Make current revision for snap "snapd" unavailable
Undone today at 15:36 CET today at 15:38 CET Copy snap "snapd" data
Undone today at 15:36 CET today at 15:38 CET Setup snap "snapd" (6775) security profiles
Undone today at 15:36 CET today at 15:38 CET Make snap "snapd" (6775) available to the system
Error today at 15:36 CET today at 15:38 CET Automatically connect eligible plugs and slots of snap "snapd"
Hold today at 15:36 CET today at 15:38 CET Set automatic aliases for snap "snapd"
Hold today at 15:36 CET today at 15:38 CET Setup snap "snapd" aliases
Hold today at 15:36 CET today at 15:38 CET Run post-refresh hook of "snapd" snap if present
Hold today at 15:36 CET today at 15:38 CET Start snap "snapd" (6775) services
Hold today at 15:36 CET today at 15:38 CET Remove data for snap "snapd" (5757)
Hold today at 15:36 CET today at 15:38 CET Remove snap "snapd" (5757) from the system
Hold today at 15:36 CET today at 15:38 CET Clean up "snapd" (6775) install
Hold today at 15:36 CET today at 15:38 CET Run health check of "snapd" snap
Done today at 15:36 CET today at 15:38 CET Consider re-refresh of "snapd"

......................................................................
Make current revision for snap "snapd" unavailable

2020-03-16T15:38:26+01:00 INFO Requested daemon restart (snapd snap).

......................................................................
Make snap "snapd" (6775) available to the system

2020-03-16T15:37:41+01:00 INFO Requested daemon restart (snapd snap).

......................................................................
Automatically connect eligible plugs and slots of snap "snapd"

2020-03-16T15:38:04+01:00 INFO Waiting for restart...
2020-03-16T15:38:04+01:00 INFO Waiting for restart...
2020-03-16T15:38:04+01:00 INFO Waiting for restart...
2020-03-16T15:38:05+01:00 INFO Waiting for restart...
2020-03-16T15:38:05+01:00 INFO Waiting for restart...
2020-03-16T15:38:05+01:00 INFO Waiting for restart...
2020-03-16T15:38:06+01:00 INFO Waiting for restart...
2020-03-16T15:38:06+01:00 INFO Waiting for restart...
2020-03-16T15:38:06+01:00 INFO Waiting for restart...
2020-03-16T15:38:09+01:00 ERROR there was a snapd rollback across the restart

Compare this with

$ snap change 241 # the one that worked
Status Spawn Ready Summary
Done today at 15:41 CET today at 15:41 CET Ens...

Read more...

Revision history for this message
Zygmunt Krynicki (zyga) wrote :
Download full text (10.8 KiB)

Journal from the current boot:

Mar 16 13:06:37 $HOSTNAME systemd[1]: Starting Snap Daemon...
Mar 16 13:06:41 $HOSTNAME snapd[725]: AppArmor status: apparmor is enabled and all features are available
Mar 16 13:06:41 $HOSTNAME snapd[725]: daemon.go:343: started snapd/2.44~pre2+git1424.gc370566 (series 16) ubuntu-core/18 (arm64) linux/4.19.59-rt24-icx23.
Mar 16 13:06:41 $HOSTNAME snapd[725]: daemon.go:436: adjusting startup timeout by 1m15s (pessimistic estimate of 30s plus 5s per snap)
Mar 16 13:07:32 $HOSTNAME systemd[1]: Started Snap Daemon.
Mar 16 13:12:05 $HOSTNAME snapd[725]: devicestate.go:177: installing unasserted gadget "rexroth-arch01-hw"
Mar 16 13:15:30 $HOSTNAME snapd[725]: udevmon.go:149: udev event error: Unable to parse uevent, err: no buffer space available
Mar 16 13:19:25 $HOSTNAME snapd[725]: storehelpers.go:438: cannot refresh: snap has no updates available: "core", "core18", "snapd"
Mar 16 13:27:33 $HOSTNAME snapd[725]: storehelpers.go:438: cannot refresh: snap has no updates available: "core", "core18", "snapd"
Mar 16 13:27:33 $HOSTNAME snapd[725]: autorefresh.go:397: auto-refresh: all snaps are up-to-date
Mar 16 15:36:57 $HOSTNAME snapd[725]: storehelpers.go:438: cannot refresh: snap has no updates available: "core", "core18"
Mar 16 15:37:40 $HOSTNAME systemd[1]: snapd.service: Current command vanished from the unit file, execution of the command list won't be resumed.
Mar 16 15:37:41 $HOSTNAME snapd[725]: daemon.go:539: gracefully waiting for running hooks
Mar 16 15:37:41 $HOSTNAME snapd[725]: daemon.go:541: done waiting for running hooks
Mar 16 15:38:06 $HOSTNAME snapd[725]: cannot run daemon: context deadline exceeded
Mar 16 15:38:06 $HOSTNAME systemd[1]: snapd.service: Main process exited, code=exited, status=1/FAILURE
Mar 16 15:38:06 $HOSTNAME systemd[1]: snapd.service: Failed with result 'exit-code'.
Mar 16 15:38:06 $HOSTNAME systemd[1]: snapd.service: Triggering OnFailure= dependencies.
Mar 16 15:38:07 $HOSTNAME systemd[1]: snapd.service: Service hold-off time over, scheduling restart.
Mar 16 15:38:07 $HOSTNAME systemd[1]: snapd.service: Scheduled restart job, restart counter is at 1.
Mar 16 15:38:07 $HOSTNAME systemd[1]: Stopped Snap Daemon.
Mar 16 15:38:07 $HOSTNAME systemd[1]: Starting Snap Daemon...
Mar 16 15:38:07 $HOSTNAME systemd[1]: Stopped Snap Daemon.
Mar 16 15:38:23 $HOSTNAME systemd[1]: Starting Snap Daemon...
Mar 16 15:38:23 $HOSTNAME snapd[2386]: AppArmor status: apparmor is enabled and all features are available
Mar 16 15:38:23 $HOSTNAME snapd[2386]: daemon.go:343: started snapd/2.44~pre2+git1424.gc370566 (series 16) ubuntu-core/18 (arm64) linux/4.19.59-rt24-icx23.
Mar 16 15:38:24 $HOSTNAME snapd[2386]: daemon.go:436: adjusting startup timeout by 1m15s (pessimistic estimate of 30s plus 5s per snap)
Mar 16 15:38:24 $HOSTNAME systemd[1]: Started Snap Daemon.
Mar 16 15:38:26 $HOSTNAME snapd[2386]: daemon.go:539: gracefully waiting for running hooks
Mar 16 15:38:27 $HOSTNAME snapd[2386]: daemon.go:541: done waiting for running hooks
Mar 16 15:38:32 $HOSTNAME systemd[1]: snapd.service: Service hold-off time over, scheduling restart.
Mar 16 15:38:32 $HOSTNAME systemd[1]: snapd.service: Scheduled restart jo...

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

From looking at the source code and the logs, I believe this is what happens:
- snapd snap is updated
- snapd requests a restart
- we reach the code in daemon.Stop() where we attempt a graceful shutdown of http connections
- the shutdown code reaches a default timeout (25s, matches with the log), while there's still active connections
  - the listen socket is closed as the first step, so no new connections are possible
- the error of the tomb is set and propagated up the call stack
- snapd exits with an error triggering a failover handling which reverts the new revision

I suspect there may be a slow client hanging on the snapd API socket that keeps the connection alive. We should consider swallowing up the error like we do for system restarts, or using a forceful http.Server.Close() when the timeout is hit.

Zygmunt Krynicki (zyga)
summary: - refresh of snapd failed, maybe because of concurrent refresh of core
+ refresh of snapd failed
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

I've opened https://github.com/snapcore/snapd/pull/8275 which should address the problem.

Changed in snapd:
assignee: nobody → Maciej Borzecki (maciek-borzecki)
status: New → Fix Committed
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This was released as a part of snapd 2.44

Changed in snapd:
milestone: none → 2.45
milestone: 2.45 → 2.44
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.