Activity log for bug #1877491

Date Who What changed Old value New value Message
2020-05-08 01:56:19 Matthew Ruffell bug added bug
2020-05-08 01:56:26 Matthew Ruffell cloud-init: status New In Progress
2020-05-08 01:56:29 Matthew Ruffell cloud-init: assignee Matthew Ruffell (mruffell)
2020-05-08 01:56:38 Matthew Ruffell tags sts
2020-05-08 01:57:08 Matthew Ruffell attachment added screenshot of dpkg prompt https://bugs.launchpad.net/cloud-init/+bug/1877491/+attachment/5368084/+files/Screenshot%20from%202020-04-14%2014-39-11.png
2020-05-08 01:57:32 Matthew Ruffell description Currently, we populate the debconf database variable grub-pc/install_devices by checking to see if a device is present in a hardcoded list [1] of directories: - /dev/sda - /dev/vda - /dev/xvda - /dev/sda1 - /dev/vda1 - /dev/xvda1 [1] https://github.com/canonical/cloud-init/blob/master/cloudinit/config/cc_grub_dpkg.py While this is a simple elegant solution, the hardcoded list does not match real world conditions, where grub is installed to a disk which is not on this list. The primary example is any cloud which uses NVMe storage, such as AWS c5 instances. /dev/nvme0n1 is not on the above list, and in this case, falls back to a hardcoded /dev/sda value for grub-pc/install_devices. The thing is, the grub postinstall script [2] checks to see if the value from grub-pc/install_devices exists, and if it doesn't, shows the user an interactive dpkg prompt where they must select the disk to install grub to. See the screenshot [3]. [2] https://paste.ubuntu.com/p/5FChJxbk5K/ [3] This breaks scripts that don't set DEBIAN_FRONTEND=noninteractive as they get hung waiting for the user to input a choice. I propose that we modify the cc_grub_dpkg module to be more robust at selecting the correct disk grub is installed to. Why not simply add an extra directory to the hardcoded list? Lets take NVMe storage as an example again. On a c5d.large instance I spun up just now, lsblk returns: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 18M 1 loop /snap/amazon-ssm-agent/1566 loop1 7:1 0 93.8M 1 loop /snap/core/8935 nvme0n1 259:0 0 46.6G 0 disk nvme1n1 259:1 0 8G 0 disk └─nvme1n1p1 259:2 0 8G 0 part / We cannot hardcode /dev/nvme0n1, as the NVMe naming conventions are not stable in the kernel, and some boots the 8G disk will be /dev/nvme0n1, and others will be /dev/nvme1n1. Instead, I propose a slightly more complex, but still well tested and well defined method for determining the disk that grub is installed to. The procedure closely follows how the postinst.in script [2] for grub2 determines what disks should be selected, and uses the exact same commands, just run as subprocesses. 1) We check to see if /usr/sbin/grub-mkdevicemap exists. If it does, grub has been installed. If not, we are in a container, and can exit with empty values. 2) We execute "grub-mkdevicemap -n -m - | cut -f2" to get a list of valid grub install targets. 3) We determine if the system is EFI or BIOS based by checking the existence of /sys/firmware/efi. If BIOS goto 4). If EFI goto 5). 4) If BIOS, we iterate over each drive in the list from 2), and use dd to pull the first 512 bytes of the MBR, and search for the word "GRUB". The command used is "dd if="$device" bs=512 count=1 2> /dev/null | grep -aq GRUB". We select the disk which contains the "GRUB" string. 5) If EFI, we find the disk which contains the /boot/EFI partition, by parsing mountpoints, like "findmnt -o SOURCE -n /boot/efi". From there, we check if we can simply drop the partition number, by cross referencing the list from 2). If not, we are likely on bare metal, and need to turn the disk to a /dev/disk/by-id value to match what grub-mkdevicemap generates. Most values written to grub-pc/install_devices will be in a /dev/disk/by-id format, as produced by grub-mkdevicemap. This is robust to unstable kernel device naming conventions. On Nitro, this returns: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0179fff411dd211f0 On Xen, this returns: /dev/xvda On a typical QEMU/KVM machine, this returns: /dev/vda On my personal desktop computer, this returns: /dev/disk/by-id/ata-WDC_WD5000AAKX-00PWEA0_WD-WMAYP3497618 I believe this method is much more robust at detecting the correct grub install disk than the previous hardcoded list. It is more complex, and I accept that it will increase boot time by a few tenths of a second as it runs these programs. cc_grub_dpkg only runs once, on instance creation, so this shouldn't be a major problem. I have tested this on AWS, on Xen, Nitro, on KVM, with BIOS and EFI based instances, in LXC, and on bare metal with a BIOS based MAAS machine. All give the correct results in my testing. Due to the complexity of the code, I anticipate this will need a few revisions to get right, so please let me know if something needs to be changed. TESTING: You can fetch grub-pc/install_devices with: $ echo get grub-pc/install_devices | sudo debconf-communicate grub-pc Reset with: $ echo reset grub-pc/install_devices | sudo debconf-communicate grub-pc Currently, we populate the debconf database variable grub-pc/install_devices by checking to see if a device is present in a hardcoded list [1] of directories: - /dev/sda - /dev/vda - /dev/xvda - /dev/sda1 - /dev/vda1 - /dev/xvda1 [1] https://github.com/canonical/cloud-init/blob/master/cloudinit/config/cc_grub_dpkg.py While this is a simple elegant solution, the hardcoded list does not match real world conditions, where grub is installed to a disk which is not on this list. The primary example is any cloud which uses NVMe storage, such as AWS c5 instances. /dev/nvme0n1 is not on the above list, and in this case, falls back to a hardcoded /dev/sda value for grub-pc/install_devices. The thing is, the grub postinstall script [2] checks to see if the value from grub-pc/install_devices exists, and if it doesn't, shows the user an interactive dpkg prompt where they must select the disk to install grub to. See the screenshot [3]. [2] https://paste.ubuntu.com/p/5FChJxbk5K/ [3] https://launchpadlibrarian.net/478771797/Screenshot%20from%202020-04-14%2014-39-11.png This breaks scripts that don't set DEBIAN_FRONTEND=noninteractive as they get hung waiting for the user to input a choice. I propose that we modify the cc_grub_dpkg module to be more robust at selecting the correct disk grub is installed to. Why not simply add an extra directory to the hardcoded list? Lets take NVMe storage as an example again. On a c5d.large instance I spun up just now, lsblk returns: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 18M 1 loop /snap/amazon-ssm-agent/1566 loop1 7:1 0 93.8M 1 loop /snap/core/8935 nvme0n1 259:0 0 46.6G 0 disk nvme1n1 259:1 0 8G 0 disk └─nvme1n1p1 259:2 0 8G 0 part / We cannot hardcode /dev/nvme0n1, as the NVMe naming conventions are not stable in the kernel, and some boots the 8G disk will be /dev/nvme0n1, and others will be /dev/nvme1n1. Instead, I propose a slightly more complex, but still well tested and well defined method for determining the disk that grub is installed to. The procedure closely follows how the postinst.in script [2] for grub2 determines what disks should be selected, and uses the exact same commands, just run as subprocesses. 1) We check to see if /usr/sbin/grub-mkdevicemap exists. If it does, grub has been installed. If not, we are in a container, and can exit with empty values. 2) We execute "grub-mkdevicemap -n -m - | cut -f2" to get a list of valid grub install targets. 3) We determine if the system is EFI or BIOS based by checking the existence of /sys/firmware/efi. If BIOS goto 4). If EFI goto 5). 4) If BIOS, we iterate over each drive in the list from 2), and use dd to pull the first 512 bytes of the MBR, and search for the word "GRUB". The command used is "dd if="$device" bs=512 count=1 2> /dev/null | grep -aq GRUB". We select the disk which contains the "GRUB" string. 5) If EFI, we find the disk which contains the /boot/EFI partition, by parsing mountpoints, like "findmnt -o SOURCE -n /boot/efi". From there, we check if we can simply drop the partition number, by cross referencing the list from 2). If not, we are likely on bare metal, and need to turn the disk to a /dev/disk/by-id value to match what grub-mkdevicemap generates. Most values written to grub-pc/install_devices will be in a /dev/disk/by-id format, as produced by grub-mkdevicemap. This is robust to unstable kernel device naming conventions. On Nitro, this returns: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0179fff411dd211f0 On Xen, this returns: /dev/xvda On a typical QEMU/KVM machine, this returns: /dev/vda On my personal desktop computer, this returns: /dev/disk/by-id/ata-WDC_WD5000AAKX-00PWEA0_WD-WMAYP3497618 I believe this method is much more robust at detecting the correct grub install disk than the previous hardcoded list. It is more complex, and I accept that it will increase boot time by a few tenths of a second as it runs these programs. cc_grub_dpkg only runs once, on instance creation, so this shouldn't be a major problem. I have tested this on AWS, on Xen, Nitro, on KVM, with BIOS and EFI based instances, in LXC, and on bare metal with a BIOS based MAAS machine. All give the correct results in my testing. Due to the complexity of the code, I anticipate this will need a few revisions to get right, so please let me know if something needs to be changed. TESTING: You can fetch grub-pc/install_devices with: $ echo get grub-pc/install_devices | sudo debconf-communicate grub-pc Reset with: $ echo reset grub-pc/install_devices | sudo debconf-communicate grub-pc
2020-05-10 23:18:19 Dominique Poulain bug added subscriber Dominique Poulain
2020-05-11 17:42:24 Dan Streetman bug added subscriber Dan Streetman
2020-05-12 18:05:53 Matthieu Clemenceau bug added subscriber Julian Andres Klode
2020-05-14 05:06:17 Matthew Ruffell description Currently, we populate the debconf database variable grub-pc/install_devices by checking to see if a device is present in a hardcoded list [1] of directories: - /dev/sda - /dev/vda - /dev/xvda - /dev/sda1 - /dev/vda1 - /dev/xvda1 [1] https://github.com/canonical/cloud-init/blob/master/cloudinit/config/cc_grub_dpkg.py While this is a simple elegant solution, the hardcoded list does not match real world conditions, where grub is installed to a disk which is not on this list. The primary example is any cloud which uses NVMe storage, such as AWS c5 instances. /dev/nvme0n1 is not on the above list, and in this case, falls back to a hardcoded /dev/sda value for grub-pc/install_devices. The thing is, the grub postinstall script [2] checks to see if the value from grub-pc/install_devices exists, and if it doesn't, shows the user an interactive dpkg prompt where they must select the disk to install grub to. See the screenshot [3]. [2] https://paste.ubuntu.com/p/5FChJxbk5K/ [3] https://launchpadlibrarian.net/478771797/Screenshot%20from%202020-04-14%2014-39-11.png This breaks scripts that don't set DEBIAN_FRONTEND=noninteractive as they get hung waiting for the user to input a choice. I propose that we modify the cc_grub_dpkg module to be more robust at selecting the correct disk grub is installed to. Why not simply add an extra directory to the hardcoded list? Lets take NVMe storage as an example again. On a c5d.large instance I spun up just now, lsblk returns: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 18M 1 loop /snap/amazon-ssm-agent/1566 loop1 7:1 0 93.8M 1 loop /snap/core/8935 nvme0n1 259:0 0 46.6G 0 disk nvme1n1 259:1 0 8G 0 disk └─nvme1n1p1 259:2 0 8G 0 part / We cannot hardcode /dev/nvme0n1, as the NVMe naming conventions are not stable in the kernel, and some boots the 8G disk will be /dev/nvme0n1, and others will be /dev/nvme1n1. Instead, I propose a slightly more complex, but still well tested and well defined method for determining the disk that grub is installed to. The procedure closely follows how the postinst.in script [2] for grub2 determines what disks should be selected, and uses the exact same commands, just run as subprocesses. 1) We check to see if /usr/sbin/grub-mkdevicemap exists. If it does, grub has been installed. If not, we are in a container, and can exit with empty values. 2) We execute "grub-mkdevicemap -n -m - | cut -f2" to get a list of valid grub install targets. 3) We determine if the system is EFI or BIOS based by checking the existence of /sys/firmware/efi. If BIOS goto 4). If EFI goto 5). 4) If BIOS, we iterate over each drive in the list from 2), and use dd to pull the first 512 bytes of the MBR, and search for the word "GRUB". The command used is "dd if="$device" bs=512 count=1 2> /dev/null | grep -aq GRUB". We select the disk which contains the "GRUB" string. 5) If EFI, we find the disk which contains the /boot/EFI partition, by parsing mountpoints, like "findmnt -o SOURCE -n /boot/efi". From there, we check if we can simply drop the partition number, by cross referencing the list from 2). If not, we are likely on bare metal, and need to turn the disk to a /dev/disk/by-id value to match what grub-mkdevicemap generates. Most values written to grub-pc/install_devices will be in a /dev/disk/by-id format, as produced by grub-mkdevicemap. This is robust to unstable kernel device naming conventions. On Nitro, this returns: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0179fff411dd211f0 On Xen, this returns: /dev/xvda On a typical QEMU/KVM machine, this returns: /dev/vda On my personal desktop computer, this returns: /dev/disk/by-id/ata-WDC_WD5000AAKX-00PWEA0_WD-WMAYP3497618 I believe this method is much more robust at detecting the correct grub install disk than the previous hardcoded list. It is more complex, and I accept that it will increase boot time by a few tenths of a second as it runs these programs. cc_grub_dpkg only runs once, on instance creation, so this shouldn't be a major problem. I have tested this on AWS, on Xen, Nitro, on KVM, with BIOS and EFI based instances, in LXC, and on bare metal with a BIOS based MAAS machine. All give the correct results in my testing. Due to the complexity of the code, I anticipate this will need a few revisions to get right, so please let me know if something needs to be changed. TESTING: You can fetch grub-pc/install_devices with: $ echo get grub-pc/install_devices | sudo debconf-communicate grub-pc Reset with: $ echo reset grub-pc/install_devices | sudo debconf-communicate grub-pc Currently, we populate the debconf database variable grub-pc/install_devices by checking to see if a device is present in a hardcoded list [1] of directories: - /dev/sda - /dev/vda - /dev/xvda - /dev/sda1 - /dev/vda1 - /dev/xvda1 [1] https://github.com/canonical/cloud-init/blob/master/cloudinit/config/cc_grub_dpkg.py While this is a simple elegant solution, the hardcoded list does not match real world conditions, where grub is installed to a disk which is not on this list. The primary example is any cloud which uses NVMe storage, such as AWS c5 instances. /dev/nvme0n1 is not on the above list, and in this case, falls back to a hardcoded /dev/sda value for grub-pc/install_devices. The thing is, the grub postinstall script [2] checks to see if the value from grub-pc/install_devices exists, and if it doesn't, shows the user an interactive dpkg prompt where they must select the disk to install grub to. See the screenshot [3]. [2] https://paste.ubuntu.com/p/5FChJxbk5K/ [3] https://launchpadlibrarian.net/478771797/Screenshot%20from%202020-04-14%2014-39-11.png This breaks scripts that don't set DEBIAN_FRONTEND=noninteractive as they get hung waiting for the user to input a choice. I propose that we modify the cc_grub_dpkg module to be more robust at selecting the correct disk grub is installed to. Why not simply add an extra directory to the hardcoded list? Lets take NVMe storage as an example again. On a c5d.large instance I spun up just now, lsblk returns: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 46.6G 0 disk nvme1n1 259:1 0 8G 0 disk └─nvme1n1p1 259:2 0 8G 0 part / We cannot hardcode /dev/nvme0n1, as the NVMe naming conventions are not stable in the kernel, and some boots the 8G disk will be /dev/nvme0n1, and others will be /dev/nvme1n1. Instead, I propose that we determine which grub has been installed to by following the grub2 debian/postinst.in script, and implementing the algorithm behind usable_partitions(), device_to_id() and available_ids() functions [3]. [3] https://paste.ubuntu.com/p/vKFNSwNyhP/ This uses grub-probe to find the root disk where the /boot directory is located, and then turns the disk name into a /dev/disk/by-id/ value. This is robust to unstable kernel device naming conventions. On Nitro, this returns: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0179fff411dd211f0 On Xen, this returns: /dev/xvda On a typical QEMU/KVM machine, this returns: /dev/vda On my personal desktop computer, this returns: /dev/disk/by-id/ata-WDC_WD5000AAKX-00PWEA0_WD-WMAYP3497618 I have tested this on AWS, on Xen, Nitro, on KVM, with BIOS and EFI based instances, in LXC, and on bare metal with a BIOS based MAAS machine. All give the correct results in my testing. TESTING: You can fetch grub-pc/install_devices with: $ echo get grub-pc/install_devices | sudo debconf-communicate grub-pc Reset with: $ echo reset grub-pc/install_devices | sudo debconf-communicate grub-pc
2020-06-02 16:00:08 Dariusz Gadomski bug added subscriber Dariusz Gadomski
2020-06-09 21:58:19 Matthew Ruffell cloud-init: status In Progress Fix Committed
2020-06-09 21:58:37 Matthew Ruffell summary cc_grub_dpkg: determine idevs in a more robust manner with grub-mkdevicemap cc_grub_dpkg: determine idevs in a more robust manner with grub-probe
2020-07-30 21:22:27 C de-Avillez bug added subscriber C de-Avillez
2020-07-31 04:15:08 Juancho bug added subscriber Juancho
2020-08-01 00:57:32 Nivedita Singhvi bug added subscriber Nivedita Singhvi
2020-08-25 19:31:30 James Falcon cloud-init: status Fix Committed Fix Released
2023-05-12 06:38:25 James Falcon bug watch added https://github.com/canonical/cloud-init/issues/3679
2023-05-17 12:47:43 Dan Streetman removed subscriber Dan Streetman