uc20 the-tool has some design issues

Bug #1875969 reported by Ian Johnson on 2020-04-29
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ian Johnson

Bug Description

With the current state of the initrd, we have the following logic in "the-tool":

1. snap-bootstrap is called and outputs something to mount
2. the-tool attempts to perform the requested mount using systemd-mount, which starts a transient mount unit for the destination from the source, notably the-tool disables exiting on errors just to execute systemd-mount here.
3. the-tool keeps calling snap-bootstrap in a loop, exiting if snap-bootstrap exits non-zero, but otherwise continuing to run all the same

This has a few problems:

1. If systemd-mount ever starts failing, the-tool swallows this failure and keeps running inifintely
2. snap-bootstrap has no way to know that the-tool's attempt to mount something actually failed and thus has no way to really respond to any errors here (other than perhaps keeping track of it's own requests and relying on the fact that attempts to mount things are ignored so snap-bootstrap will always get invoked again no matter what)
3. We currently have systemd units which will effectively run systemd-fsck on ubuntu-seed using the by-label udev entry, but this is racy (see bug https://bugs.launchpad.net/snapd/+bug/1875968), and this also is undesirable in the world where we cross-check that partitions we are using/mounting/dealing with match the partition we booted the kernel from, because there could be another partition with the same label on a different disk (see bug https://bugs.launchpad.net/snapd/+bug/1863886)
4 AFAICT, systemd-mount will not ever re-try the mount that is requested when called again with the same arguments, if systemd-mount has been run before with the same source and destination I believe we need to _restart_ the unit as it goes into failed state and systemd-mount refuses to re-create the unit.
5. How systemd-mount creates mounts is not persistent across the switch root that happens when we are at the end of the initrd, and we should make sure the mounts are persistent across the switch root so that user-space snapd can take advantage of the fact that snap-bootstrap mounted certain things in specific places during the initrd.

Issue 1 is particularly troublesome considering that we cannot get a debug shell in the initrd until the-tool has failed or exited properly.

Issue 2 is also somewhat troubling because if we consider the FDE case where something legitimately goes wrong in the field, since many of these devices are headless, it is desirable to actually do something in the case where systemd-mount fails. This could be because there is a bug in the new kernel we are trying to boot or there could be some sort of attack against the device, but regardless, we should try to put snap-bootstrap in a position where it could do something about the situation, even if that situation 99% of the time is just reboot and try again

Issue 3 is troubling because it means that snap-bootstrap is not the only one choosing partitions that should be mounted/inspected/verified/touched, and AFAIK, it is racy when you have multiple partitions with the same label (from different disks) as to which one gets populated in /dev/disk/by-label by udev.

Issue 4 is annoying because it means that we don't re-try mounts even if it failed, but that is not currently a blocker. Perhaps in a world where snap-bootstrap or the-tool knows that something failed, we could re-try the operation cleanly by restarting the mount unit.

Issue 5 is also annoying because ideally user-space snapd should make the decision about what should be unmounted and such, but as the current implementation stands if a mount is "busy" it will persist, but if it is not busy it will not persist. Making this more robust would be great.

summary: - uc20 the-tool has some issues
+ uc20 the-tool has some design issues
description: updated
tags: added: core20
Changed in snapd:
assignee: nobody → Ian Johnson (anonymouse67)
importance: Undecided → High
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers