Comment 5 for bug 2007827

Revision history for this message
Dave Jones (waveform) wrote : Re: flash-kernel failure when upgrading f-k anad kernel in the same cycle

Hmm, this is a tricky one and there's a lot I don't like here.

The first thing I don't like is I probably caused this issue back in focal :). I was fixing an issue of flash-kernel triggers getting "forgotten" during flash-kernel updates (LP: #1667742).

The second thing I don't like is that we apparently haven't noticed this (quite serious) issue in all the releases since. That's ... suspicious? I would at least have expected this to have cropped up for some people on the pi images which ought to be seeing something similar and are quite widely used.

The third thing I'm not particularly keen on is the proposed fix: it causes flash-kernel to exit *assuming* it'll be run in future without actually guaranteeing that (contrast with the trigger deferral logic near the top of main where flash-kernel triggers itself and then exits to ensure it will definitely be run later). It also doesn't necessarily account for the scenario where flash-kernel is being upgraded to fix an issue in flash-kernel's historic behaviour. If the current kernel needs re-installing anyway (because there was some defect in the way an old version of f-k handled things), even when a new kernel version will be installed later, that should still be done as we can't guarantee that the new kernel's initrd will be generated successfully.

Hence, the two things we really want to avoid:

* Exiting without error when an error has actually occurred (i.e. the initrd is missing, but it's *actually* missing not just "waiting to be generated")

* Guaranteeing we don't "skip" flash-kernel executions when flash-kernel itself is being upgraded, even when they might *seem* redundant (they're only redundant in the case that everything works correctly, but the error scenarios are valid edge cases)

I've spent a few hours digging into all the trigger logic that exists between the kernel images, initramfs-tools (responsible for generating the initrd), and flash-kernel. I'm reasonably convinced at this point that we can't *prevent* flash-kernel from running at a point where a new kernel isn't *completely* installed. The fact that flash-kernel *must* run when it itself is upgraded means we're always vulnerable to being run at that point.

However, flash-kernel is being run to install "the lastest version". It uses "linux-version list" to get the installed kernels. That operation lists the kernels which *exist* (as, say, /boot/vmlinuz-5.15.0-1018-xilinx-zynqmp) under /boot, but doesn't care whether the kernel package is actually *installed fully* by dpkg yet.

For example: /boot/vmlinuz-5.15.0-1018-xilinx-zynqmp exists because the linux-image-5.15.0-1018-xilinx-zynqmp has been unpacked, but it must still be in "triggers-awaiting" state because the initramfs-tools trigger has not yet run to generate the corresponding initrd.

So ... I think the root of the issue here is that flash-kernel is not being sufficiently discriminating of its selection of the "latest" kernel; it considers not-fully-installed kernels to be valid for installation. In fact, even if the initrd *does* exist, if triggers are still pending it still shouldn't consider that a valid candidate (this can occur if, for example, a linux-modules-extra package has been removed so the initrd needs to be regenerated to remove the modules provided by it).

Conclusion:

I should enhance the filters after "linux-version list" to exclude kernels from packages which do not have a status of "Installed". One drawback with this, in the scenario above, this will result in flash-kernel "pointlessly" re-installing 5.15.0-1015-xilinx-zynqmp (the "old", but current kernel) on its first "real" run, rather than silently skipping it. However, I don't think that's an error: consider that the flash-kernel upgrade may very well be correcting something in flash-kernel which requires it to re-run for the current kernel.

This should still operate correctly in the event that f-k is used with "--force" to install a specified version of the kernel; the filtration is only used to determine the "latest" kernel and the system admin is still free to override this with their choice of version (even unpackaged versions that are manually installed in /boot) via the "--force" flag.