lxd-installer can race or temp-fail and then block itself
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
lxd-installer (Ubuntu) | Status tracked in Oracular | |||||
Focal |
Confirmed
|
Undecided
|
Unassigned | |||
Jammy |
Confirmed
|
Undecided
|
Unassigned | |||
Mantic |
Won't Fix
|
Undecided
|
Unassigned | |||
Noble |
Confirmed
|
Undecided
|
Unassigned | |||
Oracular |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Hey,
while checking for some other issue I realized that pre-installed LXD isn't always working.
It is fully pre-installed on cloud-images and server installs to provide users quick access to a great feature.
But in minimal images it is not installed (ok for the reason to be minimal), yet it is not fully gone either and what is left fails without a clear indication to the (uneducated) user.
There we have `lxd-installer`
Normal image:
```
$ snap list | grep lxd
lxd 5.0.2-838e1b2 24322 5.0/stable/… canonical** -
$ which lxc
/snap/bin/lxc
$ lxc list
If this is your first time running LXD on this machine, you should also run: lxd init
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm
+------
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------
```
But OTOH a minimal image has this ...
```
$ which lxc
/usr/sbin/lxc
$ dpkg -S /usr/sbin/lxc
dpkg-query: no path found matching pattern /usr/sbin/lxc
$ cat /usr/sbin/lxc
#!/bin/sh
SNAP_BIN=
if [ ! -f ${SNAP_BIN} ]; then
python3 -c 'import socket; s=socket.
fi
exec $SNAP_BIN "$@"
$ snap list
No snaps are installed yet. Try 'snap install hello-world'.
AFAICS this is trying to use lxd-installer which is a package, so let me try to file it against this and images in general.
It is trying to hold that connection until the installer has brought in lxd and then pass it on.
# cat /lib/systemd/
[Unit]
Description=Helper to install lxd snap on demand
[Service]
ExecStart=/bin/sh -eux /usr/share/
StandardInput=
StandardOutput=
StandardError=
Restart=no
This is up (as socket) after start as one would expect.
And it even works fine usually:
$ lxc launch ubuntu-
Creating j-test
Starting j-test
$ lxc exec j-test bash
root@j-test:~# systemctl status lxd-installer.
● lxd-installer.
Loaded: loaded (/lib/systemd/
Active: active (listening) since Thu 2023-10-12 08:00:29 UTC; 4s ago
Listen: /run/lxd-
Accepted: 0; Connected: 0;
Tasks: 0 (limit: 1171)
Memory: 0B
CPU: 374us
CGroup: /system.
Oct 12 08:00:29 j-test systemd[1]: Starting Helper to install lxd snap on demand...
Oct 12 08:00:29 j-test systemd[1]: Listening on Helper to install lxd snap on demand.
root@j-test:~# lxc list
If this is your first time running LXD on this machine, you should also run: lxd init
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm
+------
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------
But if instead this ever failed, then it is like:
root@m:~# systemctl status lxd-installer.
● lxd-installer.
Loaded: loaded (/lib/systemd/
Active: active (listening) since Thu 2023-09-28 09:47:48 UTC; 1 week 6 days ago
Triggers: ● lxd-installer@
● lxd-installer@
● lxd-installer@
● lxd-installer@
Listen: /run/lxd-
Accepted: 4; Connected: 0;
Tasks: 0 (limit: 38254)
Memory: 0B
CPU: 580us
CGroup: /system.
Sep 28 09:47:48 m systemd[1]: Starting lxd-installer.
Sep 28 09:47:48 m systemd[1]: Listening on lxd-installer.
root@m:~# lxc list
Traceback (most recent call last):
File "<string>", line 1, in <module>
ConnectionReset
/usr/sbin/lxc: 6: exec: /snap/bin/lxc: not found
While my initial case hitting this was due to an unknown failure
Oct 12 07:31:35 m systemd[1]: lxd-installer@
Oct 12 07:32:55 m systemd[1]: Started lxd-installer@
Oct 12 07:32:55 m systemd[1]: lxd-installer@
That was due to
snap install lxd
error: system does not fully support snapd: The "fuse" filesystem is required on this system but
not available. Please try to install the fuse package.
But you do not have to re-create this.
Instead the repro-case is not too hard using the impatience simulator:
$ lxc launch ubuntu-
Creating j-test
Starting j-test
$ lxc exec j-test bash
# abort this first time to simulate any reason it might fail
root@j-test:~# lxc list
^CTraceback (most recent call last):
File "<string>", line 1, in <module>
KeyboardInterrupt
# Now see it never coming back to live
root@j-test:~# lxc list
Traceback (most recent call last):
File "<string>", line 1, in <module>
ConnectionReset
/usr/sbin/lxc: 6: exec: /snap/bin/lxc: not found
This is due to the service counting as started and going on in the background.
$ ps axlf | grep installer
0 0 753 240 20 0 4020 2092 ? S+ pts/0 0:00 \_ grep --color=auto installer
4 0 556 1 20 0 2888 948 ? Ss ? 0:00 /bin/sh -eux /usr/share/
If you wait long enough it will recover
There are a few scenarios I can think of:
1. a boot race, the socket is not yet up - triggers the same issue
2. a transient issue occurred, lxd-installer-
3. there is a permanent, "lxd-installer will fail" problem
#1 and #2 should detect this and wait for the soon or already running job.
But it needs to be able to differ from #3 in which case it needs to give up at some point.
Maybe something simple like a retry + timeout logic might provide all of that?
no longer affects: | cloud-images |
summary: |
- lxd-installer is not idempotent + lxd-installer can race or temp-fail and then block itself |
Changed in lxd-installer (Ubuntu): | |
status: | Fix Released → Triaged |
This bug was fixed in the package lxd-installer - 2
---------------
lxd-installer (2) noble; urgency=medium
* Pick the proper snap channel (LP: #2043843)
* Wait for snap bin to be present before continuing (LP: #2039148)
* Inform the user that LXD snap is being installed (LP: #2039584)
-- Simon Deziel <email address hidden> Fri, 17 Nov 2023 15:36:50 -0500