cc_grub_dpkg updates grub-pc or grub-efi debconf keys, but both can become incorrect on BIOS-booted Azure Ubuntu
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Expired
|
Undecided
|
Unassigned |
Bug Description
Platform: Azure (generation 1 VMs)
Image used: Ubuntu Server, SKU 20_04-lts (gen 1)
(cloudinit 22.4.2-
Azure generation 1 VMs boot in BIOS mode. Ubuntu comes with both BIOS and UEFI support installed, and cloudinit updates some debconf keys presumably to avoid mismatches when boot packages get updated on new machines:
https:/
Even when booted in BIOS mode, updating EFI packages (e.g. grub-efi-
If a discrepancy occurs on the disk ID where GRUB is installed (one scenario below), cloudinit only updates the `grub-pc` debconf keys (link above). The mismatched `grub-efi` key can cause further EFI package upgrades to fail, requiring a user with a shell to validate a prompt for dpkg configuration.
[scenario]
Sample scenario where we encountered this issue, using Packer to build a custom VM image:
* Packer creates a BIOS VM from the base Ubuntu 20.04 image (gen 1).
* cloudinit updates the `grub-pc` key:
```
2023-02-26 08:40:19,507 - cc_grub_
```
* Packages get upgraded. Upgrades to EFI packages results in "Installing grub to /boot/efi" (dpkg logs) and debconf `grub-efi/
* Customized VM gets saved by Packer as an image.
...
* Later, we spin up gen 1 (BIOS) VMs from that image. Its root disk has its own serial ID.
(GRUB partition = scsi-14d5346542
* cloudinit updates the `grub-pc` key:
```
2023-03-07 00:25:44,780 - cc_grub_
```
* Later, a headless `apt upgrade` breaks:
```
Setting up shim-signed (1.40.9+
mount: /var/lib/grub/esp: special device /dev/disk/
```
```
# debconf-show grub-pc | egrep "grub-efi/
* grub-efi/
* grub-pc/
```
[/scenario]
In this situation, when running `apt upgrade` updating an EFI package (or `dpkg --configure -a` once broken) in a shell, a user can manually validate this prompt:
```
┌──────
│ The GRUB boot loader was previously installed to a disk that is no longer present, or whose |
| unique identifier has changed for some reason. It is important to make sure that the installed |
| GRUB core image stays in sync with GRUB modules and grub.cfg. Please check again to make sure |
| that GRUB is written to the appropriate boot devices. |
│ |
│ GRUB install devices: |
│ |
│ [*] /dev/sda15 (111 MB; /boot/efi) on 32213 MB Virtual_Disk |
│ |
│ │
│ <Ok> │
│ │
└──────
```
Accepting this prompt appears to update/fix the `grub-efi` debconf key.
In my testing, `DEBIAN_
Reminder: this is all on BIOS-booted VMs, as far as I know UEFI boot is never involved here.
This bug is a follow-up to a quick discussion on https:/
Support for EFI-booted machines to update grub debconf was introduced in that recent change, although based on the boot mode: if EFI-booted, update debconf `grub-efi`, otherwise update `grub-pc`.
This unfortunately doesn't solve the case above, where BIOS machines have EFI configured and an intermediate/
My uneducated guess is that we may want cloudinit to update either/both debconf keys if BIOS and/or EFI support is *installed*, instead of checking the current boot mode (= presence of `/sys/firmware/
I do not know how to detect this. Presence of a grub-efi* package? Presence of /boot/efi?
Changed in cloud-init: | |
status: | New → Incomplete |
(Additional notes)
Disclaimer: I've discovered these cloud-init details while troubleshooting this issue involving Azure, Ubuntu, GRUB, EFI, and Packer. My knowledge of cloud-init and grub EFI support is fresh and very limited.
This seems like a cloud-init issue given that cc_grub_dpkg.py exists in the first place to patch the `grub-pc` debconf key in cloud environments.
On the flip side, the GRUB prompt and shim-signed bug linked above could mean a fix should be elsewhere, maybe even in the Azure "gen 1" Ubuntu image. grub-efi* and shim-signed are marked as 'essential' packages in APT.
Apologies I don't have the full cloud-init logs, I may need to setup a test environment to collect them.
This issue may have existed for months. We only detected it after OS disks on new Azure VMs started to receive new serial IDs, a change I was unable to trace. Other factors triggered this for us recently.