lxd-installer can race or temp-fail and then block itself

Bug #2039148 reported by Christian Ehrhardt 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxd-installer (Ubuntu)
Status tracked in Oracular
Focal
Confirmed
Undecided
Unassigned
Jammy
Confirmed
Undecided
Unassigned
Mantic
Won't Fix
Undecided
Unassigned
Noble
Confirmed
Undecided
Unassigned
Oracular
Fix Released
Undecided
Unassigned

Bug Description

Hey,
while checking for some other issue I realized that pre-installed LXD isn't always working.
It is fully pre-installed on cloud-images and server installs to provide users quick access to a great feature.

But in minimal images it is not installed (ok for the reason to be minimal), yet it is not fully gone either and what is left fails without a clear indication to the (uneducated) user.
There we have `lxd-installer`

Normal image:

```
$ snap list | grep lxd
lxd 5.0.2-838e1b2 24322 5.0/stable/… canonical** -
$ which lxc
/snap/bin/lxc
$ lxc list
If this is your first time running LXD on this machine, you should also run: lxd init
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm

+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
```

But OTOH a minimal image has this ...

```
$ which lxc
/usr/sbin/lxc

$ dpkg -S /usr/sbin/lxc
dpkg-query: no path found matching pattern /usr/sbin/lxc

$ cat /usr/sbin/lxc
#!/bin/sh
SNAP_BIN="/snap/bin/$(basename $0)"
if [ ! -f ${SNAP_BIN} ]; then
    python3 -c 'import socket; s=socket.socket(socket.AF_UNIX); s.connect("/run/lxd-installer.socket"); s.send(b"x"); s.recv(1)'
fi
exec $SNAP_BIN "$@"

$ snap list
No snaps are installed yet. Try 'snap install hello-world'.

AFAICS this is trying to use lxd-installer which is a package, so let me try to file it against this and images in general.

It is trying to hold that connection until the installer has brought in lxd and then pass it on.

# cat /lib/systemd/system/lxd-installer\@.service
[Unit]
Description=Helper to install lxd snap on demand

[Service]
ExecStart=/bin/sh -eux /usr/share/lxd-installer/lxd-installer-service
StandardInput=socket
StandardOutput=socket
StandardError=journal
Restart=no

This is up (as socket) after start as one would expect.
And it even works fine usually:

$ lxc launch ubuntu-minimal-daily:j j-test --ephemeral --vm
Creating j-test
Starting j-test
$ lxc exec j-test bash
root@j-test:~# systemctl status lxd-installer.socket
● lxd-installer.socket - Helper to install lxd snap on demand
     Loaded: loaded (/lib/systemd/system/lxd-installer.socket; enabled; vendor preset: enabled)
     Active: active (listening) since Thu 2023-10-12 08:00:29 UTC; 4s ago
     Listen: /run/lxd-installer.socket (Stream)
   Accepted: 0; Connected: 0;
      Tasks: 0 (limit: 1171)
     Memory: 0B
        CPU: 374us
     CGroup: /system.slice/lxd-installer.socket

Oct 12 08:00:29 j-test systemd[1]: Starting Helper to install lxd snap on demand...
Oct 12 08:00:29 j-test systemd[1]: Listening on Helper to install lxd snap on demand.
root@j-test:~# lxc list
If this is your first time running LXD on this machine, you should also run: lxd init
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm

+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+

But if instead this ever failed, then it is like:

root@m:~# systemctl status lxd-installer.socket
● lxd-installer.socket - Helper to install lxd snap on demand
     Loaded: loaded (/lib/systemd/system/lxd-installer.socket; enabled; preset: enabled)
     Active: active (listening) since Thu 2023-09-28 09:47:48 UTC; 1 week 6 days ago
   Triggers: ● lxd-installer@3-13717-0.service
             ● lxd-installer@1-13484-0.service
             ● lxd-installer@2-13655-0.service
             ● lxd-installer@0-13372-0.service
     Listen: /run/lxd-installer.socket (Stream)
   Accepted: 4; Connected: 0;
      Tasks: 0 (limit: 38254)
     Memory: 0B
        CPU: 580us
     CGroup: /system.slice/lxd-installer.socket

Sep 28 09:47:48 m systemd[1]: Starting lxd-installer.socket - Helper to install lxd snap on demand...
Sep 28 09:47:48 m systemd[1]: Listening on lxd-installer.socket - Helper to install lxd snap on demand.
root@m:~# lxc list
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ConnectionResetError: [Errno 104] Connection reset by peer
/usr/sbin/lxc: 6: exec: /snap/bin/lxc: not found

While my initial case hitting this was due to an unknown failure

Oct 12 07:31:35 m systemd[1]: lxd-installer@0-13372-0.service: Failed with result 'exit-code'.
Oct 12 07:32:55 m systemd[1]: Started lxd-installer@1-13484-0.service - Helper to install lxd snap on demand (PID 13484/UID 0).
Oct 12 07:32:55 m systemd[1]: lxd-installer@1-13484-0.service: Main process exited, code=exited, status=1/FAILURE

That was due to
snap install lxd
error: system does not fully support snapd: The "fuse" filesystem is required on this system but
       not available. Please try to install the fuse package.

But you do not have to re-create this.
Instead the repro-case is not too hard using the impatience simulator:

$ lxc launch ubuntu-minimal-daily:j j-test --ephemeral --vm
Creating j-test
Starting j-test
$ lxc exec j-test bash

# abort this first time to simulate any reason it might fail
root@j-test:~# lxc list
^CTraceback (most recent call last):
  File "<string>", line 1, in <module>
KeyboardInterrupt

# Now see it never coming back to live
root@j-test:~# lxc list
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ConnectionResetError: [Errno 104] Connection reset by peer
/usr/sbin/lxc: 6: exec: /snap/bin/lxc: not found

This is due to the service counting as started and going on in the background.
$ ps axlf | grep installer
0 0 753 240 20 0 4020 2092 ? S+ pts/0 0:00 \_ grep --color=auto installer
4 0 556 1 20 0 2888 948 ? Ss ? 0:00 /bin/sh -eux /usr/share/lxd-installer/lxd-installer-service

If you wait long enough it will recover

There are a few scenarios I can think of:
1. a boot race, the socket is not yet up - triggers the same issue
2. a transient issue occurred, lxd-installer-service is still running in the background
3. there is a permanent, "lxd-installer will fail" problem

#1 and #2 should detect this and wait for the soon or already running job.
But it needs to be able to differ from #3 in which case it needs to give up at some point.

Maybe something simple like a retry + timeout logic might provide all of that?

no longer affects: cloud-images
summary: - lxd-installer is not idempotent
+ lxd-installer can race or temp-fail and then block itself
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxd-installer - 2

---------------
lxd-installer (2) noble; urgency=medium

  * Pick the proper snap channel (LP: #2043843)
  * Wait for snap bin to be present before continuing (LP: #2039148)
  * Inform the user that LXD snap is being installed (LP: #2039584)

 -- Simon Deziel <email address hidden> Fri, 17 Nov 2023 15:36:50 -0500

Changed in lxd-installer (Ubuntu):
status: New → Fix Released
Revision history for this message
Simon Déziel (sdeziel) wrote :

The bug was not completely/properly fixed, especially not in Noble:

```
$ lxc launch ubuntu-minimal-daily:24.04 c1; sleep 3.5; lxc exec c1 -- lxc list
Creating c1
Starting c1
Installing LXD snap, please be patient.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ConnectionResetError: [Errno 104] Connection reset by peer
```

It seems to be due to snapd not being fully seeded when the `snap install lxd` request comes in:

```
$ lxc delete -f c1; lxc launch ubuntu-minimal-daily:24.04 c1; sleep 3.5; lxc exec c1 -- systemctl status snapd.seeded.service; lxc exec c1 -- lxc list; lxc exec c1 -- sh -c 'systemctl status lxd-installer@*.service'
Creating c1
Starting c1
● snapd.seeded.service - Wait until snapd is fully seeded
     Loaded: loaded (/usr/lib/systemd/system/snapd.seeded.service; enabled; preset: enabled)
     Active: activating (start) since Tue 2024-06-04 20:49:32 UTC; 636ms ago
   Main PID: 333 (snap)
      Tasks: 6 (limit: 36997)
     Memory: 13.2M (peak: 13.7M)
        CPU: 550ms
     CGroup: /system.slice/snapd.seeded.service
             └─333 /usr/bin/snap wait system seed.loaded

Jun 04 20:49:32 c1 systemd[1]: Starting snapd.seeded.service - Wait until snapd is fully seeded...
Installing LXD snap, please be patient.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ConnectionResetError: [Errno 104] Connection reset by peer
/usr/sbin/lxc: 12: exec: /snap/bin/lxc: not found
Error: Command not found
× lxd-installer@0-411-0.service - Helper to install lxd snap on demand (PID 411/UID 0)
     Loaded: loaded (/usr/lib/systemd/system/lxd-installer@.service; static)
     Active: failed (Result: exit-code) since Tue 2024-06-04 20:49:33 UTC; 1min 30s ago
   Duration: 240ms
TriggeredBy: ● lxd-installer.socket
    Process: 414 ExecStart=/bin/sh -eux /usr/share/lxd-installer/lxd-installer-service (code=exited, status=10)
   Main PID: 414 (code=exited, status=10)
        CPU: 24ms

Jun 04 20:49:33 c1 sh[416]: + PRIVACY_POLICY_URL=https://www.ubuntu.com/legal/terms-and-policies/privacy-policy
Jun 04 20:49:33 c1 sh[416]: + UBUNTU_CODENAME=noble
Jun 04 20:49:33 c1 sh[416]: + LOGO=ubuntu-logo
Jun 04 20:49:33 c1 sh[416]: + track=5.21
Jun 04 20:49:33 c1 sh[416]: + [ -n 24.04 ]
Jun 04 20:49:33 c1 sh[416]: + echo 5.21/stable/ubuntu-24.04
Jun 04 20:49:33 c1 sh[414]: + snap install lxd --channel=5.21/stable/ubuntu-24.04
Jun 04 20:49:33 c1 sh[417]: error: too early for operation, device not yet seeded or device model not acknowledged
Jun 04 20:49:33 c1 systemd[1]: lxd-installer@0-411-0.service: Main process exited, code=exited, status=10/n/a
Jun 04 20:49:33 c1 systemd[1]: lxd-installer@0-411-0.service: Failed with result 'exit-code'.
```

This same error can be reproduced more simply with:

```
$ lxc launch ubuntu-minimal-daily:24.04 c1; sleep 3; lxc exec c1 -- snap install lxd
Creating c1
Starting c1
error: too early for operation, device not yet seeded or device model not acknowledged
```

Changed in lxd-installer (Ubuntu):
status: Fix Released → Triaged
Revision history for this message
Simon Déziel (sdeziel) wrote :

I confirmed the "snapd not seeded" problem to affect Focal and later.

Changed in lxd-installer (Ubuntu Focal):
status: New → Confirmed
Changed in lxd-installer (Ubuntu Jammy):
status: New → Confirmed
Changed in lxd-installer (Ubuntu Noble):
status: New → Confirmed
Changed in lxd-installer (Ubuntu Mantic):
status: New → Confirmed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxd-installer - 6

---------------
lxd-installer (6) oracular; urgency=medium

  * Makefile: install scripts into /usr/sbin (instead of /sbin)
  * Makefile: make lxd-installer script executable
  * lxd-installer-service: set shell options
  * Install LXD from $LTS/stable/ubuntu-$VERSION as default channel
    (LP: #2067425):
    - lxd-installer-service: use $LTS/stable/ubuntu-$VERSION as default channel
    - d/tests/install-on-demand: update expected channel
  * d/lxd-installer@.service: remove wrapper shell
  * d/lxd-installer.socket: don't start service if /snap/bin/lxd is detected
  * d/tests/*: start lxd-installer.socket after purging LXD snap
  * scripts/lxc: use variable substitution to avoid forking to basename
  * Wait for snapd to be seeded before asking for LXD snap to be installed:
    (LP: #2039148)
    - scripts/lxc: wait for lxd-installer.socket to be present
    - lxd-installer-service: wait for snapd to be seeded
  * d/*: refresh packaging
    - d/control: switch to debhelper-compat 12
    - d/compat: remove now unneeded file
    - d/control: make it explicit that root is not required to build
    - d/control: bump standards-version to 4.5.0 (no change required)

 -- Simon Deziel <email address hidden> Tue, 11 Jun 2024 15:18:20 -0400

Changed in lxd-installer (Ubuntu Oracular):
status: Triaged → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 23.10 (Mantic Minotaur) has reached end of life, so this bug will not be fixed for that specific release.

Changed in lxd-installer (Ubuntu Mantic):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.