grub-install breaks when ESP is on raid

Bug #1466150 reported by Tony Middleton on 2015-06-17
128
This bug affects 22 people
Affects Status Importance Assigned to Milestone
grub-installer (Ubuntu)
High
Unassigned

Bug Description

I run a server with mirrored (RAID1) disks using grub-efi.

Root and /boot and /boot/grub are on mirrored partitions.

I have EFI partitions on both disks but it is not possible to RAID1 these as they are FAT32. On an EFI system grub-install will only install to one of the EFI partitions and so after running install-grub you have to remember to copy the EFI file across.

Could grub configuration and grub-install be amended to automatically install to multiple disks?

Searching around there seem to be many people asking this question without any elegant solution.

Phillip Susi (psusi) wrote :

Choice of filesystem has nothing to do with raid. You can put any FAT32 partition, including the ESP, on a raid1 if you want. You just need to make sure to use md format 0.9 or 1.0 instead of 1.1 or 1.2 so the firmware will still recognize it.

Changed in grub2 (Ubuntu):
status: New → Invalid
Tony Middleton (ximera) wrote :
Download full text (4.0 KiB)

Thank you for the reply. I had read a number of articles on this which put me off trying that option and implied rather clumsy solutions which was why I raised the request. However, I have now tried it and barring one problem it works and I can boot off either disk.

The problem occurred when I ran grub-install. It failed at the efibootmgr stage.

Here is the end of the log when installing to a non-raid ESP.

grub-install: info: copying `/boot/grub/x86_64-efi/core.efi' -> `/boot/efi/EFI/ubuntu/grubx64.efi'.
grub-install: info: Registering with EFI: distributor = `ubuntu', path = `\EFI\ubuntu\grubx64.efi', ESP at hostdisk//dev/sda,gpt1.
grub-install: info: executing efibootmgr --version </dev/null >/dev/null.
grub-install: info: executing modprobe -q efivars.
grub-install: info: executing efibootmgr -b 0000 -B.
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0005,0006,0002,0003,0004
Boot0002* Hard Drive
Boot0003* CD/DVD Drive
Boot0004* Removable Drive
Boot0005* UEFI: SanDisk Cruzer Edge 1.26
Boot0006* UEFI: ST31500341AS
grub-install: info: executing efibootmgr -c -d /dev/sda -p 1 -w -L ubuntu -l \EFI\ubuntu\grubx64.efi.
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0005,0006,0002,0003,0004
Boot0002* Hard Drive
Boot0003* CD/DVD Drive
Boot0004* Removable Drive
Boot0005* UEFI: SanDisk Cruzer Edge 1.26
Boot0006* UEFI: ST31500341AS
Boot0000* ubuntu
Installation finished. No error reported.

And here is the log for a raid1 ESP

grub-install: info: copying `/boot/grub/x86_64-efi/core.efi' -> `/boot/efi/EFI/ubuntu/grubx64.efi'.
grub-install: info: Registering with EFI: distributor = `ubuntu', path = `\EFI\ubuntu\grubx64.efi', ESP at mduuid/f1f50fa9a6d7d446dddc9c93a8fc41a3.
grub-install: info: executing efibootmgr --version </dev/null >/dev/null.
grub-install: info: executing modprobe -q efivars.
grub-install: info: executing efibootmgr -c -d.
efibootmgr: option requires an argument -- 'd'
efibootmgr version 0.11.0
usage: efibootmgr [options]
        -a | --active sets bootnum active
        -A | --inactive sets bootnum inactive
        -b | --bootnum XXXX modify BootXXXX (hex)
        -B | --delete-bootnum delete bootnum (hex)
        -c | --create create new variable bootnum and add to bootorder
        -D | --remove-dups remove duplicate values from BootOrder
        -d | --disk disk (defaults to /dev/sda) containing loader
        -e | --edd [1|3|-1] force EDD 1.0 or 3.0 creation variables, or guess
        -E | --device num EDD 1.0 device number (defaults to 0x80)
        -g | --gpt force disk with invalid PMBR to be treated as GPT
        -H | --acpi_hid XXXX set the ACPI HID (used with -i)
        -i | --iface name create a netboot entry for the named interface
        -l | --loader name (defaults to \EFI\redhat\grub.efi)
        -L | --label label Boot manager display label (defaults to "Linux")
        -n | --bootnext XXXX set BootNext to XXXX (hex)
        -N | --delete-bootnext delete BootNext
        -o | --bootorder XXXX,YYYY,ZZZZ,... explicitly set BootOrder (hex)
        -O | --delete-bootorder delete BootOrder
        -p | --part part (defaults to 1...

Read more...

Phillip Susi (psusi) wrote :

Looks like grub-install got confused by the raid and failed to pass the proper device to efibootmgr.

Changed in grub2 (Ubuntu):
importance: Undecided → Low
status: Invalid → Triaged
summary: - Feature request: For EFI system grub-install should be able to install
- to multiple disks
+ grub-install breaks when ESP is on raid
Alek_A (ackbeat) wrote :

Hi! I have same issue. We are running Ubuntu 16.04 on our servers,
/boot/efi is on /dev/md0 (which is raid1 metadata 0.90 array of 4 little partitions, each on the beginning of one of the 4 disks)
System boots normally, but I believe that is because the EFI entries were created before, when the partitions were not in the mirror. Or maybe BIOS detects them somehow!

# grub-install
Installing for x86_64-efi platform.
efibootmgr: option requires an argument -- 'd'
efibootmgr version 0.12
usage: efibootmgr [options]
        -a | --active sets bootnum active
        -A | --inactive sets bootnum inactive
        -b | --bootnum XXXX modify BootXXXX (hex)
        -B | --delete-bootnum delete bootnum (hex)
        -c | --create create new variable bootnum and add to bootorder
        -C | --create-only create new variable bootnum and do not add to bootorder
        -D | --remove-dups remove duplicate values from BootOrder
        -d | --disk disk (defaults to /dev/sda) containing loader
        -e | --edd [1|3|-1] force EDD 1.0 or 3.0 creation variables, or guess
        -E | --device num EDD 1.0 device number (defaults to 0x80)
        -g | --gpt force disk with invalid PMBR to be treated as GPT
        -i | --iface name create a netboot entry for the named interface
        -l | --loader name (defaults to \EFI\redhat\grub.efi)
        -L | --label label Boot manager display label (defaults to "Linux")
        -n | --bootnext XXXX set BootNext to XXXX (hex)
        -N | --delete-bootnext delete BootNext
        -o | --bootorder XXXX,YYYY,ZZZZ,... explicitly set BootOrder (hex)
        -O | --delete-bootorder delete BootOrder
        -p | --part part (defaults to 1) containing loader
        -q | --quiet be quiet
        -t | --timeout seconds set boot manager timeout waiting for user input.
        -T | --delete-timeout delete Timeout.
        -u | --unicode | --UCS-2 pass extra args as UCS-2 (default is ASCII)
        -v | --verbose print additional information
        -V | --version return version and exit
        -w | --write-signature write unique sig to MBR if needed
        -@ | --append-binary-args file append extra args from file (use "-" for stdin)
        -h | --help show help/usage
Installation finished. No error reported.

AnrDaemon (anrdaemon) wrote :

Setting up linux-signed-generic-lts-xenial (4.4.0.47.34) ...
Setting up linux-libc-dev:amd64 (3.13.0-101.148) ...
Setting up shim-signed (1.19~14.04.1+0.8-0ubuntu2) ...
Installing for x86_64-efi platform.
efibootmgr: option requires an argument -- 'd'
efibootmgr version 0.5.4
usage: efibootmgr [options]
        -a | --active sets bootnum active
        -A | --inactive sets bootnum inactive
        -b | --bootnum XXXX modify BootXXXX (hex)
        -B | --delete-bootnum delete bootnum (hex)
        -c | --create create new variable bootnum and add to bootorder
        -d | --disk disk (defaults to /dev/sda) containing loader
        -e | --edd [1|3|-1] force EDD 1.0 or 3.0 creation variables, or guess
        -E | --device num EDD 1.0 device number (defaults to 0x80)
        -g | --gpt force disk with invalid PMBR to be treated as GPT
        -H | --acpi_hid XXXX set the ACPI HID (used with -i)
        -i | --iface name create a netboot entry for the named interface
        -l | --loader name (defaults to \elilo.efi)
        -L | --label label Boot manager display label (defaults to "Linux")
        -n | --bootnext XXXX set BootNext to XXXX (hex)
        -N | --delete-bootnext delete BootNext
        -o | --bootorder XXXX,YYYY,ZZZZ,... explicitly set BootOrder (hex)
        -O | --delete-bootorder delete BootOrder
        -p | --part part (defaults to 1) containing loader
        -q | --quiet be quiet
           | --test filename don't write to NVRAM, write to filename.
        -t | --timeout seconds set boot manager timeout waiting for user input.
        -T | --delete-timeout delete Timeout.
        -u | --unicode | --UCS-2 pass extra args as UCS-2 (default is ASCII)
        -U | --acpi_uid XXXX set the ACPI UID (used with -i)
        -v | --verbose print additional information
        -V | --version return version and exit
        -w | --write-signature write unique sig to MBR if needed
        -@ | --append-binary-args file append extra args from file (use "-" for stdin)
Installation finished. No error reported.

AnrDaemon (anrdaemon) wrote :

ESP is also on RAID1 with 0.90 meta.

# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Tue Aug 16 11:08:49 2016
     Raid Level : raid1
     Array Size : 266176 (259.98 MiB 272.56 MB)
  Used Dev Size : 266176 (259.98 MiB 272.56 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Nov 18 15:10:03 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 8738f3e7:91896e1d:e368bf24:bd0fce41
         Events : 0.23

    Number Major Minor RaidDevice State
       0 8 1 0 active sync /dev/sda1
       1 8 17 1 active sync /dev/sdb1

Alek_A (ackbeat) wrote :

Thanks for reporting this, AnrDaemon! I'm not shure where exactly the issue is, probably grub script should be modified in a way that if it detects that the ESP on the raid it looks into /proc/mdstat and iterates the process with all the component devices. Or it should pass some extra args to efibootmgr, as reported :)

Seva Gluschenko (gvs-ya) wrote :

The bug is still reproducible in 16.04.2LTS. It is particularly funny that the grub-install reports no errors in the end:

...
grub-install: info: copying `/usr/lib/shim/shimx64.efi.signed' -> `/boot/efi/EFI/Ubuntu/shimx64.efi'.
grub-install: info: copying `/usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed' -> `/boot/efi/EFI/Ubuntu/grubx64.efi'.
grub-install: info: copying `/usr/lib/shim/mmx64.efi.signed' -> `/boot/efi/EFI/Ubuntu/mmx64.efi'.
grub-install: info: copying `/usr/lib/shim/fbx64.efi.signed' -> `/boot/efi/EFI/Ubuntu/fbx64.efi'.
grub-install: info: copying `/boot/grub/x86_64-efi/load.cfg' -> `/boot/efi/EFI/Ubuntu/grub.cfg'.
grub-install: info: Registering with EFI: distributor = `Ubuntu', path = `\EFI\Ubuntu\shimx64.efi', ESP at mduuid/37a190814f3ecd3b3eba8b653989e9c1.
grub-install: info: executing efibootmgr --version </dev/null >/dev/null.
grub-install: info: executing modprobe -q efivars.
grub-install: info: executing efibootmgr -c -d.
efibootmgr: option requires an argument -- 'd'
efibootmgr version 0.12
usage: efibootmgr [options]
 -a | --active sets bootnum active
 -A | --inactive sets bootnum inactive
 -b | --bootnum XXXX modify BootXXXX (hex)
 -B | --delete-bootnum delete bootnum (hex)
 -c | --create create new variable bootnum and add to bootorder
 -C | --create-only create new variable bootnum and do not add to bootorder
 -D | --remove-dups remove duplicate values from BootOrder
 -d | --disk disk (defaults to /dev/sda) containing loader
 -e | --edd [1|3|-1] force EDD 1.0 or 3.0 creation variables, or guess
 -E | --device num EDD 1.0 device number (defaults to 0x80)
 -g | --gpt force disk with invalid PMBR to be treated as GPT
 -i | --iface name create a netboot entry for the named interface
 -l | --loader name (defaults to \EFI\redhat\grub.efi)
 -L | --label label Boot manager display label (defaults to "Linux")
 -n | --bootnext XXXX set BootNext to XXXX (hex)
 -N | --delete-bootnext delete BootNext
 -o | --bootorder XXXX,YYYY,ZZZZ,... explicitly set BootOrder (hex)
 -O | --delete-bootorder delete BootOrder
 -p | --part part (defaults to 1) containing loader
 -q | --quiet be quiet
 -t | --timeout seconds set boot manager timeout waiting for user input.
 -T | --delete-timeout delete Timeout.
 -u | --unicode | --UCS-2 pass extra args as UCS-2 (default is ASCII)
 -v | --verbose print additional information
 -V | --version return version and exit
 -w | --write-signature write unique sig to MBR if needed
 -@ | --append-binary-args file append extra args from file (use "-" for stdin)
 -h | --help show help/usage
Installation finished. No errors reported.

John Robinson (john.robinson) wrote :

Just an FYI, this is still present in 16.04.4 which I just `apt-get upgrade`d to. I have /boot/EFI on a md RAID1 with 1.0 metadata.

I'm not sure whether the severity is correct; as far as I could tell when hacking about with the efibootmgr command, the process had removed the 'ubuntu' boot entry before failing to add the new entry, so I suspect my system was unbootable.

I have now used efibootmgr to add a new boot entry, but I'm working remotely and I'm not going to attempt to reboot until I'm sitting in front of the machine with a rescue disc/stick to hand! In particular, I am not clear what level the efibootmgr command is operating at: when I did the add on /dev/sda, subsequently listing it suggested it was also present on /dev/sdb... and I'm not sure whether in fact I ought to have two separate boot entries, one for each disc, so that if sda is a bit screwed but still present, booting can proceed with sdb.

Tony Middleton (ximera) wrote :

I agree that the severity is incorrect. Whenever grub-install is run - ie when the version of grub is upgraded - I end up with an unbootable system. Yes, my trusty rescue disk sorts it out but that's not a solution.

Referring to the comment above, I have two EFI entries, one for each disk. They both get trashed by grub-install so I recreate them by hand - either before reboot or after using the rescue disk.

Phillip Susi (psusi) on 2018-06-13
affects: grub2 (Ubuntu) → grub-installer (Ubuntu)
Changed in grub-installer (Ubuntu):
importance: Low → High
tags: added: id-5b16b1664f4c7a0f1fb8839f
Wladimir Mutel (mwg) wrote :

More and more people stumble upon this - https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1720572

Wladimir Mutel (mwg) wrote :

There is a patch at https://savannah.gnu.org/bugs/?46805 , why not try it ?

All that patch would do is show a pretty error message earlier, not actually make sure the problem is fixed -- upgrades would still fail; installs would still fail on RAID.

This has to do with grub not grokking the metadata format on disk, which is avoidable by using metadata 0.90.

On 8/21/2018 12:29 PM, Mathieu Trudel-Lapierre wrote:
> This has to do with grub not grokking the metadata format on disk, which
> is avoidable by using metadata 0.90.

What? Grub understands all of the metadata formats.

AnrDaemon (anrdaemon) wrote :

Then why it fails to install bootloader?… Even on 0.90 meta?

AnrDaemon (anrdaemon) wrote :

# grub-install --recheck --no-floppy /dev/md0
Installing for x86_64-efi platform.
efibootmgr: option requires an argument -- 'd'
efibootmgr version 0.5.4
usage: efibootmgr [options]

Sujith Pandel (sujithpandel) wrote :

This might be the fix?

Handle partition name parsing and formatting for partitioned md
https://github.com/rhboot/efivar/commit/576f55b02d9ec478bd5157352c884e3543bcca58

Alejandro Mery (amery) wrote :

I wrote a not-so-little wrapper for efibootmgr to pretend grub-install isn't broken:

# cd /bin
# mv efibootmgr efibootmgr.real
# ln -s efibootmgr.sh efibootmgr

efibootmgr.sh
---
#!/bin/sh

die() {
        echo "$*" >&2
        exit 1
}

run_device() {
        local devdir="$1" label= label_set=
        local devname= dev= partition=
        shift

        if [ "x$1" = "x-L" ]; then
                label_set=true
                label="$2"
                shift 2
        fi

        devdir="$(cd "$devdir" && pwd -P)"

        if [ -s "$devdir/partition" ]; then
                read partition < "$devdir/partition"
                devname="${devdir##*/}"
                devdir="${devdir%/*}"
        fi
        dev="/dev/${devdir##*/}"

        if [ -n "$label_set" -a -z "$label" ]; then
                label=$devname
        else
                [ -n "$label" ] || label="$(lsb_release -si)"

                label="$label ($devname)"
        fi

        set -x
        "${0%.sh}.real" "$@" -L "$label" -d "$dev" ${partition:+-p $partition}
}

run_raid() {
        local x= argv=
        local label= label_set= label_next=
        local device= devdir=
        local md_level= md_disks=

        # extract label
        for x; do
                if [ "$x" = "-L" ]; then
                        label_next=true
                        label_set=
                        label=
                elif [ -n "$label_next" ]; then
                        label_next=
                        label_set=true
                        label="$x"
                else
                        x=$(echo -n "$x" | sed -e 's|"|\\"|g')
                        argv="$argv \"$x\""
                fi
        done

        if [ -n "$label_set" ]; then
                x=$(echo -n "$label" | sed -e 's|"|\\"|g')
                argv="-L \"$x\" $argv"
        fi

        device="$(grep ' /boot/efi ' /proc/mounts | cut -d' ' -f1)"
        [ -b "$device" ] || die "ESP not mounted"
        device="$(readlink -f "$device")"
        devdir=/sys/class/block/${device##*/}

        if read md_level < $devdir/md/level 2> /dev/null; then
                if [ "$md_level" = raid1 ]; then
                        read md_disks < $devdir/md/raid_disks
                        for i in `seq $md_disks`; do
                                set +x
                                eval "run_device '$devdir/md/rd$(($i - 1))/block' $argv"
                        done
                else
                        die "RAID $md_level not supported"
                fi
        else
                # not RAID
                set -x
                eval "run_device '$devdir' $argv"
        fi
        exit 0
}

run_normal() {
        exec "${0%.sh}.real" "$@"
}

set -eu

argv=
i=1
for x; do
        if [ "$x" = "-d" -a $i -eq $# ]; then
                # /boot/efi is /dev/md and grub-install can't handle it yet
                eval "run_raid $argv"
                die "never reached"
        fi

        : $((i = i+1))
        x=$(echo -n "$x" | sed -e 's|"|\\"|g')
        argv="$argv \"$x\""
done

set -x
eval "run_normal $argv"
---

AnrDaemon (anrdaemon) wrote :

I had to configure my BIOS manually to point to both ESP partitions.
Installing new kernel is still a thrilling experience, but so far reboots were smooth.

John Robinson (john.robinson) wrote :

I just upgraded my system yesterday, which meant an updated grub, which meant grub-install was run again, which meant this happened to me again. Now on Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-150-generic x86_64) with grub2 2.02~beta2-36ubuntu3.22 and its dependencies. Then I found that Alejandro Mery's script in #17 worked for me.

John Robinson (john.robinson) wrote :

I meant #18, sorry.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.