grub-install: error: /var/lib/grub/esp doesn't look like an EFI partition.

Bug #1907280 reported by Jason Hobbs
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Incomplete
Undecided
Unassigned
curtin
New
Undecided
Unassigned

Bug Description

20.04 installation via maas is failing on my HP gen9 machine with this error:

grub-install: error: /var/lib/grub/esp doesn't look like an EFI partition.

full maas install log:
https://paste.ubuntu.com/p/nbK537BTCV/

curtin config:
https://paste.ubuntu.com/p/MjHTVPgt4W/

Related branches

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

does the created esp have a typecode of 0xef? reading the logs I'm not sure if ESP is setup to be ESP enough and has the right partition typecode.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Hm

        Found primary UEFI ESP: sda-part1
        Found UEFI ESP(s) for grub install: ['sda-part1', 'sda-part2']

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

    flag: boot
    id: sda-part2

why does sda-part2 has bootable flag?

For some reason we try to install resilient boot configuration with two ESPs sda-part1 & sda-part2, whilst the second one is not at all an ESP.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1907280] Re: grub-install: error: /var/lib/grub/esp doesn't look like an EFI partition.

On Tue, Dec 8, 2020 at 11:00 AM Dimitri John Ledkov <
<email address hidden>> wrote:

> flag: boot
> id: sda-part2
>
> why does sda-part2 has bootable flag?
>
> For some reason we try to install resilient boot configuration with two
> ESPs sda-part1 & sda-part2, whilst the second one is not at all an ESP.
>

For legacy reasons (curtin also looks at flag: boot partitions. We can add
additional checks to see if the partition with the flag: boot on it has a
format entry with fat32;
We're not checking that; so sda2 with flag: boot was selected as a
secondary ESP; but
it turned out to not be.

See curtin/commands/curthooks.py:uefi_find_grub_device_ids

Shouldn't be too much effort to scan the storage config of the
esp_partitions
and confirm they have the correct vFAT filesystem set. Otherwise we can
ignore them.

> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all, subiquity-bugs
> https://bugs.launchpad.net/bugs/1907280
>
> Title:
> grub-install: error: /var/lib/grub/esp doesn't look like an EFI
> partition.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1907280/+subscriptions
>

description: updated
no longer affects: subiquity
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yeah, so this is a bad config really: with ptable==gpt, flag==boot is how you indicate that the partition is to be set up as an ESP. Why is MAAS doing that?

Revision history for this message
Ryan Harper (raharper) wrote :

Prior to multi-esp curtin support I think this config would work fine even if
the additional flag: boot on the /boot partition (which is not an ESP) is
incorrect. I don't think curtin did much with a second partition flagged as
boot when in UEFI mode which is why this slipped through.

I suspect the /boot partition might trigger MAAS to auto-generate something
marking it bootable? That;s likely new in maas. Curtin has an existing
config that matches what's used in this deployment (UEFI, nvme, bcache, /boot
partition and /boot/efi) but it does not have flag:boot on the /boot
partition.

https://git.launchpad.net/curtin/tree/examples/tests/bcache-ceph-nvme.yaml

Changed in curtin:
status: New → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

A side effect is that when installation fails with this issue, the boot order is changed so that network booting is no longer first, but booting from disk is.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

sub'd to field high; we have a workaround of changing 'boot' disk to some other disk, but we regularly hit this in our test lab.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

See this too with MAAS 2.9.1 when using storage layouts where / is on a different disk than /boot/efi. One workaround is to place / on same physical disk as /boot/efi.

Revision history for this message
Alberto Donato (ack) wrote :

Jason, could you please provide how you configured partitions in MAAS?

Changed in maas:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for curtin because there has been no activity for 60 days.]

Changed in curtin:
status: Incomplete → Expired
Revision history for this message
Felipe Reyes (freyes) wrote :

I got into this problem using MAAS version 2.9.2 (9164-g.ac176b5c4-0ubuntu1~20.04.1)

the curtin's configuration used and the curtin logs can be found at https://private-fileshare.canonical.com/~freyes/maas-curtin.tar.bz2

Filesystems
bcache59 1.9 TB ext4 /
sda-part1 536.9 MB fat32 /boot/efi
sda-part2 1.1 GB ext4 /boot

I'm setting this bug back to "new"

Changed in curtin:
status: Expired → New
Changed in maas:
status: Expired → New
Revision history for this message
Alberto Donato (ack) wrote :

Felipe, could you please attach the output of `maas $profile partitions read $machine_id sda`?

Changed in maas:
status: New → Incomplete
Revision history for this message
Ksawery Dziekoński (ksdziekonski) wrote :

I'm deterministically replicating this on MAAS nodes set to a bcache storage layout while provisioning a Juju controller on them.

Revision history for this message
Alberto Donato (ack) wrote :

@ksdziekonski could you please provide the exact configuration steps (API commands or UI steps) that you did when configuring the machine?

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Rafael (rvallel) wrote :

I have this issue too.

Found out while deploying with MAAS 3.1

I have a new server with a more modern Realtek Ethernet driver and requires 20.04+hwe

To reproduce, I simply select the LVM template in storage, then deploy with 20.04 and hwe kernel.

Deployment fails with that error, and the PXE BIOS settings are messed, HD remains first option.

The previous HW version servers deployed fine with this Storage template and 18.04 Lots of changes thou.

Revision history for this message
Rafael (rvallel) wrote :

I can also confirm that removing the boot flag on the /boot partition fixes the issue. Although, I guess compatibility with legacy boot is lost. Another possibility is simply remove the /boot partition after assigning the LVM profile. This works (similar to the flat template).

Revision history for this message
Alberto Donato (ack) wrote :

Rafael,

could you please provide the configuration for your machine (maas $profile machine read $system_id) and steps you did for configuring storage?

Changed in maas:
status: Incomplete → New
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers