lxd interaction blocked until snapd was restarted

Bug #1873004 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd (Ubuntu)
New
Undecided
Unassigned

Bug Description

Hi,
for the second time in ~1 month I found my lxd on an s390x system to hang.

If you'd run
 $ lxc <anything>
that would hang.

Resolved by:
 $ systemctl restart snapd

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I asked what logs you'd need and got this (thanks stgraber):
[16:34] <stgraber> 'ps fauxww', 'systemctl -a', 'snap changes' and the journal output would be a good start I think

For now I only have the system with the issue resolved (after the snapd restart).
If I find it again in that state (I don't know what triggers it) I'll run it before that.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'm ok to keep this bug incomplete until I hit it again if the logs as-is don't provide anything useful yet.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

from mvo, next time also gather:
   SNAPD_DEBUG=1 lxc ...

Revision history for this message
Michael Vogt (mvo) wrote :

Thanks for your bugreport. I suspect

If you have the system in this state, does "SNAPD_DEBUG=1 lxc ..." show anything useful? can you attach the output of that as well?

Revision history for this message
Michael Vogt (mvo) wrote :

UPs, sorry - I meant to write in #9: "I suspect that it is related to the system-key generation. There might be a mismatch and snap run waits for re-generation of security profiles".

Revision history for this message
Michael Vogt (mvo) wrote :

Reading #9 again it sounds hostile - the sentence really was just cut off at the wrong poing for some reason, this was meant to read what I wrote in #10. Just wanted to clarify, I do appreciated the bugreport and the sentence should have continued.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Snapd restart will notice and re-generate the system-key.

Can you please obtain timestamps of /var/lib/snapd/system-key? Can you correlate the timestamps to the moment you restarted snapd?

When system key mismatch is detected "snap run" will do two things, make a API call to snapd (triggering socket activation) and wait for the system key to reconcile (waiting for snapd to write the correct system key to disk).

During that time all snap programs will appear to hang on startup.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

stat /var/lib/snapd/system-key
  File: /var/lib/snapd/system-key
  Size: 589 Blocks: 8 IO Block: 4096 regular file
Device: fd0ah/64778d Inode: 656044 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-04-15 10:28:47.215747475 -0400
Modify: 2020-04-15 10:28:45.845747475 -0400
Change: 2020-04-15 10:28:45.855747475 -0400
 Birth: -

That matches the restart of the service:
  Active: active (running) since Wed 2020-04-15 10:28:45 EDT; 4h 49min ago

But well, we knew it will be regenerated on restart - so this isn't entirely new.
The question is if it is out of sync when the issue occurs.

I'll run with SNAPD_DEBUG=1 once it happens again.
Anything else for the system-key to check at that time?

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

The system key is only rewritten if this is required. Normally it is not modified.

The question now is: what has changed that made the system key differ?

One thing we did observe was /snap/core/$current changing without a complete snapd re-start. Perhaps something was pinging snapd over the API and it failed to complete a restart after a core refresh?

Can you check with "snap changes" if there was a corresponding refresh of "core" or "snapd" sometime earlier than the system-key rewrite and in close proximity to the moment LXD started hanging?

System key contains interesting details of the system including build IDs of "snapd", some kernel and userspace library features. One other possibility is update of seccomp or apparmor that might have happened. Is "/usr/lib/snapd/snap-seccomp" and "/snap/core/current/usr/lib/snapd/snap-seccomp" linked statically on s390x?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.3 KiB)

The seccomp tools are dynamically linked:

$ file /usr/lib/snapd/snap-seccomp /snap/core/current/usr/lib/snapd/snap-seccomp
/usr/lib/snapd/snap-seccomp: ELF 64-bit MSB executable, IBM S/390, version 1 (SYSV), dynamically linked, interpreter /lib/ld64.so.1, BuildID[sha1]=ac22cf91f26d3e3f32132a8f717a22af7caf49fc, for GNU/Linux 3.2.0, stripped
/snap/core/current/usr/lib/snapd/snap-seccomp: ELF 64-bit MSB executable, IBM S/390, version 1 (SYSV), dynamically linked, interpreter /lib/ld64.so.1, for GNU/Linux 3.2.0, Go BuildID=wD-y12JBR_fQqHU08D57/n6hBHyUsBFuJ5Czn6fF7/JXIu77JtC7eIGO4BPOpW/1cKteqVwlvsiXC73WA1t, BuildID[sha1]=ed5cbba6dba5be3b12f065fca9981725da2c75cc, stripped

As for updates, there were apt updates, followed by reloads (probably the configure step).
That eventually restarted the snap daemon it seems.

Apr 15 08:43:50 s1lp05 systemd[1]: Stopped Daily apt download activities.
Apr 15 08:43:50 s1lp05 systemd[1]: Stopping Daily apt download activities.
Apr 15 08:43:50 s1lp05 systemd[1]: Started Daily apt download activities.
Apr 15 08:43:52 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:52 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:52 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:52 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:52 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:52 s1lp05 systemd[1]: Reloading.
Apr 15 08:43:52 s1lp05 systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/>
Apr 15 08:43:52 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:53 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:53 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:53 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:53 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:53 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:53 s1lp05 dbus-daemon[3425]: [system] Reloaded configuration
Apr 15 08:43:57 s1lp05 systemd[1]: Stopping Snap Daemon...
Apr 15 08:43:57 s1lp05 snapd[2423936]: main.go:155: Exiting on terminated signal.
Apr 15 08:43:57 s1lp05 systemd[1]: snapd.service: Succeeded.
Apr 15 08:43:57 s1lp05 systemd[1]: Stopped Snap Daemon.
Apr 15 08:43:57 s1lp05 systemd[1]: snapd.socket: Succeeded.
Apr 15 08:43:57 s1lp05 systemd[1]: Closed Socket activation for snappy daemon.

That was per /var/log/dpkg.log
2020-04-15 08:43:57 upgrade snapd:s390x 2.44+20.04 2.44.3+20.04

Core was refreshed on 1st of April I think:
ubuntu@s1lp05:~$ snap info core
...
refresh-date: 20 days ago, at 18:14 EDT
channels:
  latest/stable: 16-2.44.1 2020-03-31 (8932) 89MB -
  latest/candidate: 16-2.44.3 2020-04-14 (9064) 89MB -
  latest/beta: 16-2.44.3 2020-04-10 (9064) 89MB -
  latest/edge: 16-2.44.3+git1757.66c7369 2020-04-20 (9125) 92MB -
installed: 16-2.44.1 (8932) 89MB core
ubuntu@s1lp05:~$ date
Mon Apr...

Read more...

no longer affects: lxd (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.