Comment 1 for bug 1889555

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, so the issue we're dealing with here is that bug 1877491 fixed the grub install device for _new_ NVMe instances, but it did not fix it on existing NVMe instances. So, for existing instances, they will still have an incorrect grub install device configured (something like /dev/sda).

grub has two parts: the core and its modules. These two components are expected to be updated in lockstep. If each component is using a different ABI (i.e. they are not in lockstep), then systems will fail to boot. The components are kept in lockstep by the grub packaging; it will install the core to the grub install device(s) to ensure this.

For NVMe systems which have an incorrect grub install device configured (i.e. any which were launched before 2020/07/15), the grub packaging will fail to perform this core installation. This means that the core and modules will be using incompatible ABIs, so such systems will fail to boot when next rebooted.

(Note that the core/modules ABI does not change on every update to the grub package, so this mismatched boot failure will not be observed on every grub update. This bug has been filed, however, because we _do_ have such a grub update pending/in progress.)

There is a grub bug (bug 1889556) for handling the case where the core and modules are mismatched, but the solution there will require manual user intervention. cloud-init can fix this for NVMe drives non-interactively by redetermining the grub install devices in its postinst using its existing logic.