MAAS commissioning process should attempt to use LXD socket

Bug #1887996 reported by Lee Trager
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Won't Fix
Medium
Unassigned

Bug Description

MAAS 2.7 changed how it was gathering commissioning data to collecting it from LXD. When this was implemented the MAAS team agreed to build a custom binary which outputs the values from various LXD API endpoints so LXD doesn't have to be installed during commissioning.

20.04 LTS and above cloud images include the LXD Snap. The Snap provides a socket to interact with the API, the endpoints MAAS needs don't require LXD to be configured. The LXD API also contains a list of api_extensions which MAAS can use to verify the Snap version of LXD has all the data MAAS requires.

There are two advantages to doing this
1. Commissioning won't have to download the machine-resources binary, saving ~6M per machine.
2. MAAS users will receive LXD fixes quicker. The LXD team releases updates much more frequently than MAAS. A fix to LXD is often out within 2 weeks while it can take a over a month for MAAS to release a fix. LXD also releases a nightly edge Snap, MAAS users could write a custom commissioning script to install the LXD edge Snap and receive LXD fixes the next day.

Essentially 50-maas-01-commissioning would be updated to check if /var/snap/lxd/common/lxd/unix.socket exists. If it doesn't download the machine-resources binary. If it does verify host_info contains all API extenions MAAS requires, then gather resources, and possibly networking.

host_info - The base object 50-maas-01-commissioning returns
curl -G --unix-socket "/var/snap/lxd/common/lxd/unix.socket" "lxd/1.0"

resources - Included as a subobject in 50-maas-01-commissioning, where most commissioning data comes from.
curl -G --unix-socket "/var/snap/lxd/common/lxd/unix.socket" "lxd/1.0/resources"

networks - For when MAAS starts processing network data from LXD
curl -G --unix-socket "/var/snap/lxd/common/lxd/unix.socket" "lxd/1.0/networks"

Revision history for this message
Alberto Donato (ack) wrote :

FTR you can use "lxc query /1.0/resources" rather than curl, so you don't have to specify the socket path or other options

Changed in maas:
status: New → Triaged
importance: Undecided → Medium
milestone: none → next
Revision history for this message
Björn Tillenius (bjornt) wrote :

I'm not convinced we should do this. By building the binary ourselves, we have a stable integration point, and we don't have to be part in the LXD release process, to ensure changes won't break MAAS.

Furthermore, I'm also not convinced it will be easier to get fixes. Currently issues tend to get fixed in master, and we can pull it in without any problems. We also don't have to care about backward-compatible.

If we use the lxd snap, we need to make sure that the change don't break us. Also, for example focal currently has 4.0 installed by default. So getting a fix would mean getting it backported to 4.0, which I don't think would be trivial.

Revision history for this message
Lee Trager (ltrager) wrote :

LXD is very good at versioning and keeping track of api_extensions. One of the reasons why I include LXD hostinfo data, even in our binary, is so MAAS can check that required api_extensions are included. For example LXD added "resources_usb_pci" when USB and PCI device support was added to the resources endpoint even though the API version didn't change.

LXD has to have include fixes for MAAS regardless, this was agreed to when we switched to LXD for commissioning information. MAAS relies on this when adding LXD VMHosts. LXD information is remotely gathered via the LXD API, we don't have a way to use the binary.

Its pretty simple to update a snap with a commissioning script

# --- Start MAAS 1.0 script metadata ---
# name: 01-update-lxd
# script_type: commissioning
# packages:
# snap:
# - name: lxd
# channel: latest/edge
# --- End MAAS 1.0 script metadata ---

Revision history for this message
Alberto Donato (ack) wrote :

note that the snap being installed doesn't mean that lxd is running.

"lxd init" must be run before the daemon is set up (I'm not totally sure if the snap does that on install with defaults).

Also, the deamon uses socket activation, so it won't start by default at startup, and the first request to the socket will make it start. This could take a bit of time, though, so it might not be necessarily faster than using the binary.

Revision history for this message
Lee Trager (ltrager) wrote :

LXD runs by default in the ephemeral environment. If you watch the MAAS ephemeral boot process you can see it pause while LXD is starting up every time. You can access the socket without using lxd init. The attached commissioning script updates the LXD Snap to edge then reads hostinfo from the socket. Runtime on my system is 6 seconds.

Revision history for this message
Alberto Donato (ack) wrote :

I agree with Bjorn we shouldn't use the LXD from the host as MAAS relies on information provided from the resources binaries from the same maas version, while preinstalled LXD might be an older version and not provide all the info MAAS expects.

Changed in maas:
status: Triaged → Won't Fix
Changed in maas:
milestone: next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.