MAAS fails to deploy HPE DL380 Gen10 when virtual install drive is enabled

Bug #1900695 reported by Michał Ajduk
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Incomplete
Medium
Unassigned
curtin
Incomplete
Undecided
Unassigned
grub
New
Undecided
Unassigned

Bug Description

# ENVIRONMENT
MAAS version (SNAP):
maas 2.8.2-8577-g.a3e674063 8980 2.8/stable canonical✓ -

MAAS was cleanly installed. KVM POD setup works.

MAAS status:
bind9 RUNNING pid 9258, uptime 15:13:02
dhcpd RUNNING pid 26173, uptime 15:09:30
dhcpd6 STOPPED Not started
http RUNNING pid 19526, uptime 15:10:49
ntp RUNNING pid 27147, uptime 14:02:18
proxy RUNNING pid 25909, uptime 15:09:33
rackd RUNNING pid 7219, uptime 15:13:20
regiond RUNNING pid 7221, uptime 15:13:20
syslog RUNNING pid 19634, uptime 15:10:48

Machine:
HPE DL380 Gen10
Storage - comissioning output:
"NAME": "sda", (virtual install drive)
  "MODEL": "LUN 00 Media 0",
  /devices/pci0000:00/0000:00:14.0/usb2/2-3/2-3.1/2-3.1:1.0/host0/target0:0:0/0:0:0:0/block/sda
  "SIZE": "536870912",
"NAME": "sdb", Embedded RAID 1 : HPE Smart Array P816i-a SR Gen10 - 894.2 GiB, RAID1 Logical Drive 1
  "MODEL": "LOGICAL VOLUME",
  "PATH": "/dev/sdb",
  "DEVPATH": "/devices/pci0000:5b/0000:5b:00.0/0000:5c:00.0/host1/target1:1:0/1:1:0:0/block/sdb",
  "SIZE": "960163569664",
"NAME": "sdc", (HPE Smart Array P816i-a SR Gen10 - 447.1 GiB, RAID1 Logical Drive 2)
  "MODEL": "LOGICAL VOLUME",
  "PATH": "/dev/sdc",
  "DEVPATH": "/devices/pci0000:5b/0000:5b:00.0/0000:5c:00.0/host1/target1:1:0/1:1:0:1/block/sdc",
  "SIZE": "480070426624",

# PROBLEM DESCRIPTION

MAAS fails to reboot into deployed OS. "Local" menu entry in MAAS provided grub.cfg fails to instruct grub to find the bootloader on the local drives and forces to use fallback to EFI boot order.
Root cause

0) identify install device:
2020-10-20T06:56:37+00:00 cmp3az2cz20300kv8 cloud-init[2459]: get_path_to_storage_volume for volume sdb({'grub_device': True, 'id': 'sdb', 'model': 'LOGICAL VOLUME', 'name': 'sdb', 'ptable': 'gpt', 'serial': '600508b1001cade9268ac61a1c3cee4b', 'type': 'disk', 'wipe': 'superblock'})

Grub is configured not to touch NVRAM:
2020-10-20T06:57:01+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Transferred {'grub2': 'grub2 grub2/update_nvram boolean false',

1) MAAS installs grub on the machine:
2020-10-20T06:57:02+00:00 cmp3az2cz20300kv8 cloud-init[2459]: start: cmd-install/stage-curthooks/builtin/cmd-curthooks: Installing packages on target system: ['efibootmgr', 'grub-efi-amd64', 'grub-efi-amd64-signed', 'shim-signed']
2020-10-20T06:57:09+00:00 cmp3az2cz20300kv8 cloud-init[2459]: finish: cmd-install/stage-curthooks/builtin/cmd-curthooks: SUCCESS: Installing packages on target system: ['efibootmgr', 'grub-efi-amd64', 'grub-efi-amd64-signed', 'shim-signed']

2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: start: cmd-install/stage-curthooks/builtin/cmd-curthooks/install-grub: installing grub to target devices
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: setup grub on target /tmp/tmpxf91lob9/target
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Found primary UEFI ESP: sdb-part1
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Found UEFI ESP(s) for grub install: ['sdb-part1']
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: get_path_to_storage_volume for volume sdb-part1({'device': 'sdb', 'flag': 'boot', 'id': 'sdb-part1', 'name': 'sdb-part1', 'number': 1, 'offset': '4194304B', 'size': '536870912B', 'type': 'partition', 'uuid': '17649a3f-6e9a-445c-a20a-74914d4c5f88', 'wipe': 'superblock'})
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: get_path_to_storage_volume for volume sdb({'grub_device': True, 'id': 'sdb', 'model': 'LOGICAL VOLUME', 'name': 'sdb', 'ptable': 'gpt', 'serial': '600508b1001cade9268ac61a1c3cee4b', 'type': 'disk', 'wipe': 'superblock'})

2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Applying grub debconf_selections config:
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: {'debconf_selections': {'grub': 'grub-pc grub-efi/install_devices multiselect /dev/disk/by-id/scsi-3600508b1001cade9268ac61a1c3cee4b-part1'}}

2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: installing grub to target=/tmp/tmpxf91lob9/target devices=['/dev/sdb1'] [replace_defaults=None]
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpxf91lob9/target', 'dpkg', '--print-architecture'] with allowed return codes [0] (capture=True)
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: grub: moved /tmp/tmpxf91lob9/target/etc/default/grub.d/50-cloudimg-settings.cfg out of the way
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: updated /tmp/tmpxf91lob9/target/etc/default/grub to set: GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,115200n8 nvme_core.multipath=0"
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Using grub install command: grub-install
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Grub install cmds:
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: [['efibootmgr', '-v'], ['dpkg-reconfigure', 'grub-efi-amd64'], ['update-grub'], ['grub-install', '--target=x86_64-efi', '--efi-directory=/boot/efi', '--bootloader-id=ubuntu', '--recheck', '--no-nvram'], ['efibootmgr', '-v']]

2020-10-20T06:57:46+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpxf91lob9/target', 'grub-install', '--target=x86_64-efi', '--efi-directory=/boot/efi', '--bootloader-id=ubuntu', '--recheck', '--no-nvram'] with allowed return codes [0] (capture=True)

2) MAAS sets up the boot order to ensure PXE boot:
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Setting currently booted 0016 as the first UEFI loader.
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: New UEFI boot order: 0016,0000,000B,000C,0018,001A,0010,0012,001C,001E,000A,0014,0001,0002,0003,0004,0005,0006,0007,0008,0009

Note that the boot order set is:
0016 - NIC (PXE IPv4)
0000 - fail to system utilities

There device where the OS is installed (Boot000B) is futher down in the boot order.

Consult below:
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpxf91lob9/target', 'efibootmgr', '
-o', '0016,0000,000B,000C,0018,001A,0010,0012,001C,001E,000A,0014,0001,0002,0003,0004,0005,0006,0007,0008,0009'] with allowed return codes [0] (capture=False)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: BootCurrent: 0016
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Timeout: 0 seconds
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: BootOrder: 0016,0000,000B,000C,0018,001A,0010,0012,001C,001E,000A,0014,0001,0002,0003,0004,0005,0006,0007
,0008,0009
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0000* System Utilities
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0001 Embedded UEFI Shell
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0002 Diagnose Error
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0003 Intelligent Provisioning
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0004 Boot Menu
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0005 Network Boot
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0006 View Integrated Management Log
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0007 HTTP Boot
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0008 PXE Boot
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0009 Embedded Diagnostics
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot000A* Generic USB Boot
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot000B* Embedded RAID 1 : HPE Smart Array P816i-a SR Gen10 - 447.1 GiB, RAID1 Logical Drive 2(Target:0,
 Lun:1)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot000C* Embedded RAID 1 : HPE Smart Array P816i-a SR Gen10 - 894.2 GiB, RAID1 Logical Drive 1(Target:0,
 Lun:0)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0010* Slot 1 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (HTTP(S) IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0012* Slot 1 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (PXE IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0014* Embedded FlexibleLOM 1 Port 1 : HPE Ethernet 1Gb 4-port 366FLR Adapter - NIC (HTTP(S) IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0016* Embedded FlexibleLOM 1 Port 1 : HPE Ethernet 1Gb 4-port 366FLR Adapter - NIC (PXE IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0018* Slot 4 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (HTTP(S) IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot001A* Slot 4 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (PXE IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot001C* Slot 3 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (HTTP(S) IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot001E* Slot 3 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (PXE IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0020 Trigger ready-to-boot event

3) Finalize configuration:
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: finish: cmd-install/stage-late/maas: SUCCESS: running 'wget --no-proxy http://10-216-240-0--23.maas-internal:5248/MAAS/metadata/latest/by-id/dfkxqh/ --post-data op=netboot_off -O /dev/null'

4) The server is instructed to reboot. During the reboot is uses MAAS provided grub.cfg:
2020-10-20 06:59:36 provisioningserver.rackdservices.tftp: [info] /grub/grub.cfg-d4:f5:ef:02:28:94 requested by 10.216.240.106

MAAS provides grub configuration as follows:

ubuntu@inf1az1cz202904rz:~$ curl tftp://10.216.240.1/grub/grub.cfg-d4:f5:ef:02:28:94
set default="0"
set timeout=0

menuentry 'Local' {
    echo 'Booting local disk...'
    for bootloader in \
            boot/bootx64.efi \
            ubuntu/shimx64.efi \
            ubuntu/grubx64.efi \
            centos/shimx64.efi \
            centos/grubx64.efi \
            redhat/shimx64.efi \
            redhat/grubx64.efi \
            rhel/shimx64.efi \
            rhel/grubx64.efi \
            red/grubx64.efi \
            Microsoft/Boot/bootmgfw.efi; do
        search --set=root --file /efi/$bootloader
        if [ $? -eq 0 ]; then
            chainloader /efi/$bootloader
            boot
        fi
    done
    # If no bootloader is found exit and allow the next device to boot.
    exit
}

Unfortunately this configuration fails to find a bootloader and as such it is dropped to next boot entry, that is to Boot0000* System Utilities.

When in grub environment, following variables are set:
grub> set
grub_platform=efi
cmd_path=(tftp,10.216.240.1)
net_default_interface=efinet3
net_default_ip=10.216.240.106
net_default_mac=d4:f5:ef:02:28:94
net_default_server=10.216.240.1
net_efinet3_boot_file=bootx64.efi
net_efinet3_domain=mgt.tlc.cloud
net_efinet3_ip=10.216.240.106
net_efinet3_mac=d4:f5:ef:02:28:94
net_efinet3_next_server=10.216.240.1
package_version=2.02-2ubuntu8.18
prefix=(tftp,10.216.240.1)/grub
pxe_default_server=10.216.240.1
root=tftp,10.216.240.1

grub> ls
(memdisk) (hd0) (hd0,gpt1)
grub> ls (hd0)
(hd0): Filesystem is unknown.
grub> (hd0,gpt1)
(hd0,gpt1): Filesystem is unknown.
grub> ls (memdisk)
(memdisk): Filesystem is fat.
grub> ls (memdisk)/
grub.cfg
grub> cat (memdisk)/grub.cfg
if [ -e $prefix/x86_64-efi/grub.cfg; ] then
 source $prefix/x86_64-efi/grub.cfg
else
 source $prefix/grub.cfg
fi

Trying to run the MAAS provided config fails:
grub> search --set=root --file /efi/boot/bootx64.efi
error: no such device: /efi/boot/bootx64.efi

Grub does not see the logical volumes (sdb, sdc) hosted on hardware raid controller when VID is enabled.

After disabling the VID (Intelligent Provisioning->BIOS/Platform Configuration(RBSU)->USB options->Virtual Install Disk-Disable), grub enlists all the partitions:
grub> ls
(hd0) (hd0,gpt2) (hd0,gpt1) (hd1)
grub> search --set=root --file /efi/boot/bootx64.efi
 hd0,gpt1

Revision history for this message
Lee Trager (ltrager) wrote :

It looks like the deployment works, whats failing is booting into the deployed system. There appears to be two bugs here

1. When a deployment occurs Curtin configures the system to boot locally after trying to boot over the network. This doesn't appear to be happening.
2. GRUB isn't able to see any of the local disks.

When GRUB fails to find a local bootloader it falls back on booting the next configured device. This should be the local system but because Curtin never configures local boot system firmware is started.

Revision history for this message
Ryan Harper (raharper) wrote :

Thank you for filing a bug. Can you attach the curtin config/install logs?

https://discourse.maas.io/t/getting-curtin-debug-logs/169

There are several grub/EFI related fixes that are in curtin master; I'm not sure which version of curtin is in your stable MAAS snap; you might try testing with:

https://discourse.maas.io/t/maas-is-changing-my-boot-order/3491/5

which used:

snap refresh maas --channel=latest/edge/curtin-stable

 to get the most recent curtin with maas.

Changed in curtin:
status: New → Incomplete
Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

Hi, I am seeing the same issue on MAAS + HPE Synergy Gen 10.

I can see Michal found a solution for his issue. I will leave what I've tested for the discussion.

It sets BIOS as the second viable option by the end of curtin.
Looking at the code we generally run with Bionic version, I can see there is not much logic on it to customize boot order:
https://git.launchpad.net/curtin/tree/curtin/commands/curthooks.py?h=ubuntu/bionic&id=a1d98115e5dc2b525a3f7556f4f97dd48693f608

However, on newer 20.2, there is an extra option: https://github.com/canonical/curtin/blob/81144052c64a3d22edb68ebbd11483b463e62656/curtin/commands/curthooks.py#L520
reorder_uefi_force_fallback

In my case, after setting that, I can see the boot order got rearranged on curtin logs:
Before: https://pastebin.canonical.com/p/mMp7BvbRWG/
After: https://pastebin.canonical.com/p/FCCQtxQH29/

For that, first I had to upgrade current curtin (on each infra node):
sudo add-apt-repository ppa:curtin-dev/stable
sudo apt update
sudo apt install --only-upgrade curtin-common

Then I had to add the fallback option to the /etc/maas/preseeds/curtin_userdata
reorder_uefi_force_fallback: True

According to the docs: https://curtin.readthedocs.io/en/latest/topics/config.html

I believe we should leave this option as True on preseed.

Alberto Donato (ack)
Changed in maas:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Waiting for curtin logs

Changed in maas:
status: Triaged → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.