cc_grub_dpkg updates grub-pc or grub-efi debconf keys, but both can become incorrect on BIOS-booted Azure Ubuntu

Bug #2013419 reported by Adrien Siebert
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Expired
Undecided
Unassigned

Bug Description

Platform: Azure (generation 1 VMs)
Image used: Ubuntu Server, SKU 20_04-lts (gen 1)
(cloudinit 22.4.2-0ubuntu0~20.04.2 - no cloudinit customization)

Azure generation 1 VMs boot in BIOS mode. Ubuntu comes with both BIOS and UEFI support installed, and cloudinit updates some debconf keys presumably to avoid mismatches when boot packages get updated on new machines:
https://github.com/canonical/cloud-init/blob/ubuntu/22.4.2-0ubuntu0_20.04.2/cloudinit/config/cc_grub_dpkg.py#L148-L149

Even when booted in BIOS mode, updating EFI packages (e.g. grub-efi-amd64-signed or shim-signed) will cause the debconf `grub-efi/install_devices` keys to be updated.
If a discrepancy occurs on the disk ID where GRUB is installed (one scenario below), cloudinit only updates the `grub-pc` debconf keys (link above). The mismatched `grub-efi` key can cause further EFI package upgrades to fail, requiring a user with a shell to validate a prompt for dpkg configuration.

[scenario]
Sample scenario where we encountered this issue, using Packer to build a custom VM image:
* Packer creates a BIOS VM from the base Ubuntu 20.04 image (gen 1).
* cloudinit updates the `grub-pc` key:
```
2023-02-26 08:40:19,507 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d534654202020204f19e7ec574d624f9e27ff405f501bc0','false'
```
* Packages get upgraded. Upgrades to EFI packages results in "Installing grub to /boot/efi" (dpkg logs) and debconf `grub-efi/install_devices` keys to be set, pointing at the Packer VM disk.
* Customized VM gets saved by Packer as an image.

...

* Later, we spin up gen 1 (BIOS) VMs from that image. Its root disk has its own serial ID.
  (GRUB partition = scsi-14d53465420202020da118904a05ed740b387a530ae506ac2-part15)
* cloudinit updates the `grub-pc` key:
```
2023-03-07 00:25:44,780 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d534654202020200e6290db5a56ef43ba1f16eef596d653','false'
```
* Later, a headless `apt upgrade` breaks:
```
Setting up shim-signed (1.40.9+15.7-0ubuntu1) ...
mount: /var/lib/grub/esp: special device /dev/disk/by-id/scsi-14d534654202020204f19e7ec574d624f9e27ff405f501bc0-part15 does not exist.
```
```
# debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"
* grub-efi/install_devices: /dev/disk/by-id/scsi-14d534654202020204f19e7ec574d624f9e27ff405f501bc0-part15
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d534654202020200e6290db5a56ef43ba1f16eef596d653
```
[/scenario]

In this situation, when running `apt upgrade` updating an EFI package (or `dpkg --configure -a` once broken) in a shell, a user can manually validate this prompt:
```
┌───────────────────────────────────┤ Configuring shim-signed ├────────────────────────────────────┐
│ The GRUB boot loader was previously installed to a disk that is no longer present, or whose |
| unique identifier has changed for some reason. It is important to make sure that the installed |
| GRUB core image stays in sync with GRUB modules and grub.cfg. Please check again to make sure |
| that GRUB is written to the appropriate boot devices. |
│ |
│ GRUB install devices: |
│ |
│ [*] /dev/sda15 (111 MB; /boot/efi) on 32213 MB Virtual_Disk |
│ |
│ │
│ <Ok> │
│ │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
```
Accepting this prompt appears to update/fix the `grub-efi` debconf key.
In my testing, `DEBIAN_FRONTEND=noninteractive` disables the prompt but it instead blows up with the aforementioned `mount: /var/lib/grub/esp...` error, which may be related to https://bugs.launchpad.net/ubuntu/+source/shim-signed/+bug/1940723 --- a bug suspiciously related to Azure VMs.

Reminder: this is all on BIOS-booted VMs, as far as I know UEFI boot is never involved here.

This bug is a follow-up to a quick discussion on https://github.com/canonical/cloud-init/commit/2fd24cc8cb2e2d1b0e00eb8c66573722523a91e7
Support for EFI-booted machines to update grub debconf was introduced in that recent change, although based on the boot mode: if EFI-booted, update debconf `grub-efi`, otherwise update `grub-pc`.

This unfortunately doesn't solve the case above, where BIOS machines have EFI configured and an intermediate/customized image is used.

My uneducated guess is that we may want cloudinit to update either/both debconf keys if BIOS and/or EFI support is *installed*, instead of checking the current boot mode (= presence of `/sys/firmware/efi`).
I do not know how to detect this. Presence of a grub-efi* package? Presence of /boot/efi?

Revision history for this message
Adrien Siebert (r-asiebert) wrote :

(Additional notes)

Disclaimer: I've discovered these cloud-init details while troubleshooting this issue involving Azure, Ubuntu, GRUB, EFI, and Packer. My knowledge of cloud-init and grub EFI support is fresh and very limited.

This seems like a cloud-init issue given that cc_grub_dpkg.py exists in the first place to patch the `grub-pc` debconf key in cloud environments.
On the flip side, the GRUB prompt and shim-signed bug linked above could mean a fix should be elsewhere, maybe even in the Azure "gen 1" Ubuntu image. grub-efi* and shim-signed are marked as 'essential' packages in APT.

Apologies I don't have the full cloud-init logs, I may need to setup a test environment to collect them.

This issue may have existed for months. We only detected it after OS disks on new Azure VMs started to receive new serial IDs, a change I was unable to trace. Other factors triggered this for us recently.

Revision history for this message
Brett Holman (holmanb) wrote (last edit ):

Thanks for reporting this bug Adrien. I'm still working to understand what is happening here - in the meantime, I'm curious if, as a workaround, disabling this module might resolve your issue.

The following provided to your instance on first boot would disable this module:

#cloud-config
grub_dpkg:
    enabled: false

Chad Smith (chad.smith)
Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Chad Smith (chad.smith) wrote :

Thanks for the bug links to both related bug discussions and prior cloud-init commits in this space. I've marked incomplete as bug status in hopes of feedback from Adrien on either:
 * cloud-init logs for the failed install - or -
 * testing out Brett's suggested cloud-config from comment #2

 Please do set this bug back to 'New' status above once you are able to provide feedback on either of those options to bump this back on our radar for triage review and debugging further

Revision history for this message
Adrien Siebert (r-asiebert) wrote :

@Brett, grub_dpkg is behaving properly so I wouldn't disable it.
Arguably, it doesn't do 'enough' to fix such situations, as it only fixes the grub-pc (BIOS) or grub-efi debconf keys (only grub-pc before https://github.com/canonical/cloud-init/pull/2029), and somehow in our case both the BIOS and EFI keys need an update.

I've got some time today and may try to get a clean VM with the issue to collect logs

Revision history for this message
Adrien Siebert (r-asiebert) wrote :
Download full text (4.7 KiB)

I've reproduced this on Azure with an updated image, Ubuntu Jammy / 22.04 with cloud-init 23.1.1-0ubuntu0~22.04.1

This should be a much easier example to follow along -- the cloud-init log TARs are included for the two VMs involved.
Summary of steps done in the Azure Portal:
- First VM created from Jammy image at 2023-04-10 18:19
- Simulated a shim-signed package upgrade -- grub-efi debconf gets set.
- Captured this VM as an image
- Second VM created from custom image at 2023-04-10 18:34
- Simulated shim-signed upgrade -- breaks on grub-efi disk mismatch.

Notes about the TAR archives:
* The logs from the second VM contain the logs from the first VM that were packaged in the image.
* Some Azure metadata present in cloud-init data (e.g. Azure subscription) are redacted.
* Logs may contain additional dpkg/apt activity from me poking around.

###
### Accompanying notes and logs:
###

### The first VM was created from image "Ubuntu Server 22.04 LTS - Gen1" (publisher/offer/sku: canonical / 0001-com-ubuntu-server-jammy / 22_04-lts) with default settings ("Premium SSD" OS disk) except for restricted networking.
### (2023-04-10 18:19, adrien-test-cloud-init-imaging)

```
root@adrien-test-cloud-init-imaging:/home/azureuser# ls -la /dev/disk/by-id/ | grep part15; debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"; grep "Setting grub debconf-set-selections" /var/log/cloud-init.log
lrwxrwxrwx 1 root root 11 Apr 10 18:19 scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15 -> ../../sdb15
lrwxrwxrwx 1 root root 11 Apr 10 18:19 scsi-360022480e0488e713b6c9c7255dd721f-part15 -> ../../sdb15
lrwxrwxrwx 1 root root 11 Apr 10 18:19 wwn-0x60022480e0488e713b6c9c7255dd721f-part15 -> ../../sdb15
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f
  grub-efi/install_devices:
2023-04-10 18:20:03,717 - cc_grub_dpkg.py[DEBUG]: Setting grub debconf-set-selections with '/dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f','false'
```
Nominal: grub-efi debconf not present yet, grub-pc from cloud-init.

### Simulated our Packer installing a shim-signed upgrade (headless `apt upgrade`), which sets the `grub-efi` debconf
```
root@adrien-test-cloud-init-imaging:/home/azureuser# DEBIAN_FRONTEND=noninteractive dpkg-reconfigure shim-signed
Trying to migrate /boot/efi into esp config
Installing grub to /boot/efi.
Installing for x86_64-efi platform.
grub-install: warning: EFI variables cannot be set on this system.
grub-install: warning: You will have to complete the GRUB setup manually.
Installation finished. No error reported.

root@adrien-test-cloud-init-imaging:/home/azureuser# debconf-show grub-pc | egrep "grub-efi/install_devices:|grub-pc/install_devices:"
* grub-pc/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f
* grub-efi/install_devices: /dev/disk/by-id/scsi-14d53465420202020e0488e713b6c5e4286129c7255dd721f-part15
```

### Captured the VM image as adrien-test-cloud-init-imaging-image-20230410112812, created second VM from it
### (2023-04-10 18:34, adrien-test-cloud-init-from-custom-image)

```
root@adrien-test-cloud-init-fro...

Read more...

Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.