snapd crashed and can't start: "Re-refresh task has 1 tasks waiting for it"

Bug #1860324 reported by Colin Watson on 2020-01-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Undecided
Unassigned

Bug Description

I noticed that I had a snap that was stuck on an old revision for some reason and attempted to force it to refresh to stable:

$ snap info gitlptools
name: gitlptools
[...]
snap-id: LyyEad7cnNXM2rmBu9ySp7MJDWTKfAsF
tracking: latest/stable
refresh-date: 2 days ago, at 16:24 GMT
channels:
  stable: b9b9618 2019-02-20 (36) 14MB classic
  candidate: ↑
  beta: ↑
  edge: 0+git.cfe2bdc-dirty 2017-12-19 (3) 14MB classic
installed: 0+git.a2477ff (9) 14MB classic
$ snap refresh --stable gitlptools
error: cannot communicate with server: Get http://localhost/v2/changes/1277: dial unix /run/snapd.socket: connect: connection refused

Now snapd is stopped and can't start back up at all. syslog says:

Jan 20 09:54:37 niejwein systemd[1]: Starting Snappy daemon...
Jan 20 09:54:37 niejwein snapd[9861]: AppArmor status: apparmor is enabled and all features are available
Jan 20 09:54:37 niejwein snapd[9861]: AppArmor status: apparmor is enabled and all features are available
Jan 20 09:54:37 niejwein snapd[9861]: daemon.go:346: started snapd/2.43.1 (series 16; classic) ubuntu/18.04 (amd64) linux/4.15.0-72-generic.
Jan 20 09:54:37 niejwein snapd[9861]: daemon.go:439: adjusting startup timeout by 2m35s (pessimistic estimate of 30s plus 5s per snap)
Jan 20 09:54:38 niejwein systemd[1]: Started Snappy daemon.
Jan 20 09:54:38 niejwein snapd[9861]: handlers.go:2589: PANIC Re-refresh task has 1 tasks waiting for it.
Jan 20 09:54:38 niejwein snapd[9861]: panic: Re-refresh task has 1 tasks waiting for it.
Jan 20 09:54:38 niejwein snapd[9861]: goroutine 45 [running]:
Jan 20 09:54:38 niejwein snapd[9861]: github.com/snapcore/snapd/logger.Panicf(0x555e924913b6, 0x2c, 0xc420455e68, 0x1, 0x1)
Jan 20 09:54:38 niejwein snapd[9861]: #011/build/snapd-IqyPU7/snapd-2.43.1/_build/src/github.com/snapcore/snapd/logger/logger.go:67 +0x171
Jan 20 09:54:38 niejwein snapd[9861]: github.com/snapcore/snapd/overlord/snapstate.(*SnapManager).doCheckReRefresh(0xc4200a6690, 0xc4203cd9e0, 0xc4202a2d70, 0x0, 0x0)
Jan 20 09:54:38 niejwein snapd[9861]: #011/build/snapd-IqyPU7/snapd-2.43.1/_build/src/github.com/snapcore/snapd/overlord/snapstate/handlers.go:2589 +0x567
Jan 20 09:54:38 niejwein snapd[9861]: github.com/snapcore/snapd/overlord/snapstate.(*SnapManager).(github.com/snapcore/snapd/overlord/snapstate.doCheckReRefresh)-fm(0xc4203cd9e0, 0xc4202a2d70, 0x555e92e30320, 0x0)
Jan 20 09:54:38 niejwein snapd[9861]: #011/build/snapd-IqyPU7/snapd-2.43.1/_build/src/github.com/snapcore/snapd/overlord/snapstate/snapmgr.go:429 +0x40
Jan 20 09:54:38 niejwein snapd[9861]: github.com/snapcore/snapd/overlord/state.(*TaskRunner).run.func1(0x0, 0x0)
Jan 20 09:54:38 niejwein snapd[9861]: #011/build/snapd-IqyPU7/snapd-2.43.1/_build/src/github.com/snapcore/snapd/overlord/state/taskrunner.go:195 +0xbb
Jan 20 09:54:38 niejwein snapd[9861]: github.com/snapcore/snapd/vendor/gopkg.in/tomb%2ev2.(*Tomb).run(0xc4202a2d70, 0xc4204384b0)
Jan 20 09:54:38 niejwein snapd[9861]: #011/build/snapd-IqyPU7/snapd-2.43.1/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
Jan 20 09:54:38 niejwein snapd[9861]: created by github.com/snapcore/snapd/vendor/gopkg.in/tomb%2ev2.(*Tomb).Go
Jan 20 09:54:38 niejwein snapd[9861]: #011/build/snapd-IqyPU7/snapd-2.43.1/_build/src/github.com/snapcore/snapd/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xbb
Jan 20 09:54:38 niejwein systemd[1]: snapd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 20 09:54:38 niejwein systemd[1]: snapd.service: Failed with result 'exit-code'.
Jan 20 09:54:38 niejwein systemd[1]: snapd.service: Triggering OnFailure= dependencies.
Jan 20 09:54:38 niejwein systemd[1]: snapd.service: Service hold-off time over, scheduling restart.
Jan 20 09:54:38 niejwein systemd[1]: snapd.service: Scheduled restart job, restart counter is at 5.
Jan 20 09:54:38 niejwein systemd[1]: Stopped Snappy daemon.
Jan 20 09:54:38 niejwein systemd[1]: snapd.service: Start request repeated too quickly.
Jan 20 09:54:38 niejwein systemd[1]: snapd.service: Failed with result 'exit-code'.
Jan 20 09:54:38 niejwein systemd[1]: Failed to start Snappy daemon.
Jan 20 09:54:38 niejwein systemd[1]: snapd.socket: Failed with result 'service-start-limit-hit'.
Jan 20 09:54:38 niejwein systemd[1]: snapd.service: Triggering OnFailure= dependencies.

Help?

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: snapd 2.42.1+18.04
ProcVersionSignature: Ubuntu 4.15.0-72.81-generic 4.15.18
Uname: Linux 4.15.0-72-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
CurrentDesktop: GNOME
Date: Mon Jan 20 09:56:18 2020
EcryptfsInUse: Yes
InstallationDate: Installed on 2013-03-01 (2515 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Alpha amd64 (20130223.1)
SourcePackage: snapd
UpgradeStatus: No upgrade log present (probably fresh install)

Colin Watson (cjwatson) wrote :

Can you provide the output of `snap changes` and `snap change --last=refresh`?

If snapd is dead, make sure to stop it via `systemctl stop snapd{.service,socket}` and then run:

snap debug state /var/lib/snapd/state.json
snap debug state /var/lib/snapd/state.json --change=<id>

to obtain the same information as snap change/changes commands.

Colin Watson (cjwatson) wrote :

snapd is indeed dead and refuses to start, but here's the "snap debug state" output:

$ sudo snap debug state /var/lib/snapd/state.json
ID Status Spawn Ready Label Summary
1271 Done 3 days ago, at 22:37 GMT 3 days ago, at 22:40 GMT auto-refresh Auto-refresh snaps "core", "juju"
1272 Done 2 days ago, at 15:59 GMT 2 days ago, at 15:59 GMT configure-snap Change configuration of "core" snap
1273 Done 2 days ago, at 16:24 GMT 2 days ago, at 16:24 GMT refresh-snap Refresh snap "gitlptools"
1274 Done 2 days ago, at 16:25 GMT 2 days ago, at 16:29 GMT refresh-snap Refresh "lxd" snap from "candidate" channel
1275 Done yesterday at 22:16 GMT yesterday at 22:16 GMT auto-refresh Auto-refresh snap "gitlptools"
1276 Done today at 09:51 GMT today at 09:51 GMT refresh-snap Refresh snap "gitlptools"
1277 Doing today at 09:52 GMT 0000-12-31 refresh-snap Refresh "gitlptools" snap
$ sudo snap debug state /var/lib/snapd/state.json --change=1277
Lanes ID Status Spawn Ready Kind Summary
0 8234 Done today at 09:52 GMT today at 09:52 GMT refresh-aliases Refresh aliases for snap "gitlptools"
0 8235 Doing today at 09:52 GMT 0000-12-31 check-rerefresh Consider re-refresh of "gitlptools"
0 8236 Do today at 09:52 GMT 0000-12-31 switch-snap-channel Switch snap "gitlptools" from channel "latest/stable" to "stable"

Colin Watson (cjwatson) wrote :

$ sudo cat /var/lib/snapd/state.json | jq '.data.snaps.gitlptools'
{
  "type": "app",
  "sequence": [
    {
      "name": "gitlptools",
      "snap-id": "LyyEad7cnNXM2rmBu9ySp7MJDWTKfAsF",
      "revision": "36",
      "channel": "stable",
      "title": "Launchpad tooling for Git",
      "summary": "Tools to make git work well with Launchpad",
      "description": "A collection of plugins to make working with git and Launchpad a nicer experience.\n\nCurrently ships:\n\n`git-lp-open` for opening the current branch in your browser\n\n`git-lp-propose` for creating a merge proposal for the current branch\n\n`git-lp-approve` to top-approve a merge proposal for the current branch."
    },
    {
      "name": "gitlptools",
      "snap-id": "LyyEad7cnNXM2rmBu9ySp7MJDWTKfAsF",
      "revision": "9",
      "channel": "latest/stable",
      "title": "Launchpad tooling for Git",
      "summary": "Tools to make git work well with Launchpad",
      "description": "A collection of plugins to make working with git and Launchpad a nicer experience.\n\nCurrently ships:\n\n`git-lp-open` for opening the current branch in your browser\n\n`git-lp-propose` for creating a merge proposal for the current branch\n\n`git-lp-approve` to top-approve a merge proposal for the current branch."
    }
  ],
  "active": true,
  "current": "9",
  "channel": "latest/stable",
  "classic": true,
  "aliases": {
    "git-lp-open": {
      "auto": "git-lp-open"
    },
    "git-lp-propose": {
      "auto": "git-lp-propose"
    }
  },
  "user-id": 3
}

Paweł Stołowski (stolowski) wrote :

I had a quick look and it seems that UpdateWithDeviceContext can append "snap-switch-channel" to taskset from doUpdate(), which means a task gets scheduled after "check-rerefresh" (which panics if it finds any tasks waiting for it). AFAIR "check-rerefresh" was meant to be the last in the change.

John Lenton (chipaca) wrote :

If you're feeling brave and need to get unstuck, edit the state.json to remove the check-rerefresh task and you should be able to start snapd again.

John Lenton (chipaca) wrote :

Easier would be to set it to Done instead of removing it though

John Lenton (chipaca) wrote :

(to be clear that means set status to 4 in the json)

After banging my head against the unit tests, I got the PR up that reproduces this scenario: https://github.com/snapcore/snapd/pull/8024 The fix will end up in the same PR at some point.

John Lenton (chipaca) on 2020-01-22
affects: snapd (Ubuntu) → snapd
Changed in snapd:
status: New → In Progress
assignee: nobody → John Lenton (chipaca)
Zygmunt Krynicki (zyga) on 2020-02-04
Changed in snapd:
assignee: John Lenton (chipaca) → nobody
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers