Comment 4 for bug 1899794

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

As discussed on IRC with jibel, there was a package update happening around from 07:01:25 to 07:03:53 on Oct 12.

Looking at the logs, I suspect the following happened:

- a reexeced snapd was running
- snapd package update requested systemctl restart snapd.service
- the snapd service hit a kill timeout (unclear why, there is a log that indicates a graceful shutdown did not happen after 25s, then systemd proceeds to kill the process, which exits due to SIGTERM)
- snapd.failure.service kicks in, calls systemctl reset-failed on snapd.service
- snapd.failure starts 'previous' snapd with SNAPD_REVERT_TO_REV, but that has no meaning, because on the last run, there was no update of snapd happening
- package update calls systemctl start snapd

<at this point there are 2 snapd processes running, one under snap-failure, another started by postinst, state gets corrupted with both snapd processes modifying it>

- some time later (oct. 14 13:04:58) a refresh occurs in the process run from snap-failure, snapd exits, snapd.failure service finishes successfully
- snapd service is restarted and operates for a while
- reboot
- snapd service attempts to refresh the snapd snap, but is confused by the state not matching actual situation in the system

The problems we need to address is:
- make sure that is only one instance of snapd running
- investigate why snapd shutdown gets killed
- make sure that snapd started by snap-failure acts correctly when there was no snapd update