Comment 0 for bug 1875969

Revision history for this message
Ian Johnson (anonymouse67) wrote : uc20 the-tool has some issues

With the current state of the initrd, we have the following logic in "the-tool":

1. snap-bootstrap is called and outputs something to mount
2. the-tool attempts to perform the requested mount using systemd-mount, which starts a transient mount unit for the destination from the source, notably the-tool disables exiting on errors just to execute systemd-mount here.
3. the-tool keeps calling snap-bootstrap in a loop, exiting if snap-bootstrap exits non-zero, but otherwise continuing to run all the same

This has a few problems:

1. If systemd-mount ever starts failing, the-tool swallows this failure and keeps running inifintely
2. snap-bootstrap has no way to know that the-tool's attempt to mount something actually failed and thus has no way to really respond to any errors here (other than perhaps keeping track of it's own requests and relying on the fact that attempts to mount things are ignored so snap-bootstrap will always get invoked again no matter what)
3. We currently have systemd units which will effectively run systemd-fsck on ubuntu-seed using the by-label udev entry, but this is racy (see bug https://bugs.launchpad.net/snapd/+bug/1875968), and this also is undesirable in the world where we cross-check that partitions we are using/mounting/dealing with match the partition we booted the kernel from, because there could be another partition with the same label on a different disk (see bug https://bugs.launchpad.net/snapd/+bug/1863886)
4 AFAICT, systemd-mount will not ever re-try the mount that is requested when called again with the same arguments, if systemd-mount has been run before with the same source and destination I believe we need to _restart_ the unit as it goes into failed state and systemd-mount refuses to re-create the unit.

Issue 1 is particularly troublesome considering that we cannot get a debug shell in the initrd until the-tool has failed or exited properly.

Issue 2 is also somewhat troubling because if we consider the FDE case where something legitimately goes wrong in the field, since many of these devices are headless, it is desirable to actually do something in the case where systemd-mount fails. This could be because there is a bug in the new kernel we are trying to boot or there could be some sort of attack against the device, but regardless, we should try to put snap-bootstrap in a position where it could do something about the situation, even if that situation 99% of the time is just reboot and try again

Issue 3 is troubling because it means that snap-bootstrap is not the only one choosing partitions that should be mounted/inspected/verified/touched, and AFAIK, it is racy when you have multiple partitions with the same label (from different disks) as to which one gets populated in /dev/disk/by-label by udev.

Issue 4 is annoying because it means that we don't re-try mounts even if it failed, but that is not currently a blocker. Perhaps in a world where snap-bootstrap or the-tool knows that something failed, we could re-try the operation cleanly by restarting the mount unit.