Grub2 does not detect MD raid (level 1) 1.0 superblocks on 4k block devices

Bug #1817713 reported by JulietDeltaGolf
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (CentOS)
Expired
High
grub2 (Ubuntu)
New
Undecided
Unassigned

Bug Description

grub-install will fail if the /boot partition is located on a MD raid device level 1 backed by 4k sectorsize devices, NVMe drives in my case.

Steps to Reproduce:

1°) Create a raid1 with 1.0 superblock with two 4k sectorsize devices
2°) Create a partition for /boot and format it FAT32
3°) Mount /boot
4°) grub-install complains about not be being able to find the mduuid device

Revision history for this message
In , kwalker (kwalker-redhat-bugs) wrote :
Download full text (3.6 KiB)

Description of problem:
 Per the description above, Grub2 does not currently detect MD raid 1.0 superblocks when written to 4k block devices. The issue is only visible on disks that have 4K native sector sizes.

See below example:

    Disk /dev/vdb: 10.7 GB, 10737418240 bytes, 2621440 sectors
    Units = sectors of 1 * 4096 = 4096 bytes
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disk label type: dos
    Disk identifier: 0x6c5f13de

       Device Boot Start End Blocks Id System
    /dev/vdb1 256 128255 512000 83 Linux
    /dev/vdb2 128256 2621439 9972736 83 Linux

    Disk /dev/vdc: 10.7 GB, 10737418240 bytes, 2621440 sectors
    Units = sectors of 1 * 4096 = 4096 bytes
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disk label type: dos
    Disk identifier: 0x00000000

       Device Boot Start End Blocks Id System
    /dev/vdc1 256 128255 512000 83 Linux
    /dev/vdc2 128256 2621439 9972736 83 Linux

Two (degraded) raid devices are created, using 1.0 and 1.2 metadata revisions.

    # cat /proc/mdstat
    Personalities : [raid1]
    md1 : active raid1 vdc1[1]
          511680 blocks super 1.2 [2/1] [_U]

    md0 : active raid1 vdb1[1]
          511936 blocks super 1.0 [2/1] [_U]

    unused devices: <none>

For the 1.2 superblock format, the following is visible:

    # grub2-probe --device /dev/md1 --target fs_uuid
    f2ebfe82-55ab-45a8-b391-39c77f3c489e

Whereas the 1.0 format, with the superblock at the end of the device, returns:

  # grub2-probe --device /dev/md0 --target fsuuid
  grub2-probe error disk mduuid/4589a761dde10c78a204bcfd705df061 not found.

Version-Release number of selected component (if applicable):
  grub2-2.02-0.44.el7.x86_64

How reproducible:
  Easily - With a 4k native storage device for boot

Steps to Reproduce:
1. Install a system

2. Migrate to a MD raid (level 1) metadata 1.0 configuration using the process outlined in the article below:

    How do I create /boot on mdadm RAID 1 after installation in RHEL 7? - Red Hat Customer Portal
    https://access.redhat.com/solutions/1360363

3. Issue a "grub2-probe --device /dev/md<device> --target fs_uuid" against the device which /boot is installed to

Actual results:

    # grub2-probe --device /dev/md0 --target fsuuid
    grub2-probe error disk mduuid/4589a761dde10c78a204bcfd705df061 not found.

Expected results:

    # grub2-probe --device /dev/md1 --target fs_uuid
    f2ebfe82-55ab-45a8-b391-39c77f3c489e

Additional info:
 Note, the issue was originally noted at installation-time as the default superblock format for PPC64 (PReP boot) systems is 1.0 according to the python-blivet library, used by the anaconda installer, code snippet below:

    def preCommitFixup(self, *args, **kwargs):
        """ Determine create parameters for this set """
        mountpoints = kwargs.pop("mountpoints")
        log_method_call(self, self.name, mountpoints)

        if "/boot" in mountpoints:
    ...

Read more...

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :

------- Comment From <email address hidden> 2017-04-19 13:58 EDT-------
(In reply to comment #4)
...
> Additional info:
> Note, the issue was originally noted at installation-time as the default
> superblock format for PPC64 (PReP boot) systems is 1.0 according to the
> python-blivet library, used by the anaconda installer, code snippet below:
>
> def preCommitFixup(self, *args, **kwargs):
> """ Determine create parameters for this set """
> mountpoints = kwargs.pop("mountpoints")
> log_method_call(self, self.name, mountpoints)
>
> if "/boot" in mountpoints:
> bootmountpoint = "/boot"
> else:
> bootmountpoint = "/"
>
> # If we are used to boot from we cannot use 1.1 metadata
> if getattr(self.format, "mountpoint", None) == bootmountpoint or \
> getattr(self.format, "mountpoint", None) == "/boot/efi" or \
> self.format.type == "prepboot":
> self.metadataVersion = "1.0"

This is probably the key observation. It doesn't really make sense to have this restriction in place for disks with a PReP partition. The above appears to have already been backed out via the following commit:

commit 8bce84025e0f0af9b2538a2611e5d52257a82881
Author: David Lehman <email address hidden>
Date: Wed May 27 16:07:05 2015 -0500

Use the default md metadata version for everything except /boot/efi.

Now that we've moved to grub2 this is no longer necessary for /boot.
As far as I know we have never actually allowed PReP on md, so that's
not needed either. Apparently UEFI firmware/bootloader still needs it.

Related: rhbz#1061711

Revision history for this message
In , hannsj_uhl (hannsjuhl-redhat-bugs) wrote :

.

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :
Download full text (5.6 KiB)

------- Comment From <email address hidden> 2017-04-21 12:48 EDT-------
Here's where I have stopped grub2-install in gdb:

(gdb) run -vvv /dev/sda1
...
grub-core/osdep/hostdisk.c:415: opening the device `/dev/sda2' in open_device()
(gdb) bt
#0 grub_util_fd_seek (fd=0x8, off=0x3dcf8000) at grub-core/osdep/unix/hostdisk.c:105
#1 0x000000001013f3ac in grub_util_fd_open_device (disk=0x101e88e0, sector=0x3dcf8, flags=0x101000, max=0x3fffffffe018)
at grub-core/osdep/linux/hostdisk.c:450
#2 0x000000001013c56c in grub_util_biosdisk_read (disk=0x101e88e0, sector=0x404f8, size=0x8, buf=0x101ee130 "\370\016\347\267\377?")
at grub-core/kern/emu/hostdisk.c:289
#3 0x0000000010133ccc in grub_disk_read_small_real (disk=0x101e88e0, sector=0x2027c0, offset=0x6000, size=0x100, buf=0x3fffffffe308)
at grub-core/kern/disk.c:344
#4 0x0000000010133fac in grub_disk_read_small (disk=0x101e88e0, sector=0x2027c0, offset=0x6000, size=0x100, buf=0x3fffffffe308)
at grub-core/kern/disk.c:401
#5 0x00000000101341a8 in grub_disk_read (disk=0x101e88e0, sector=0x2027f0, offset=0x0, size=0x100, buf=0x3fffffffe308)
at grub-core/kern/disk.c:440
#6 0x000000001004371c in grub_mdraid_detect (disk=0x101e88e0, id=0x3fffffffe4c8, start_sector=0x3fffffffe4c0)
at grub-core/disk/mdraid1x_linux.c:149
#7 0x0000000010155eb0 in scan_disk_partition_iter (disk=0x101e88e0, p=0x3fffffffe548, data=0x101e8860) at grub-core/disk/diskfilter.c:161
#8 0x0000000010147000 in part_iterate (dsk=0x101e88e0, partition=0x3fffffffe660, data=0x3fffffffe900) at grub-core/kern/partition.c:196
#9 0x000000001015a2b8 in grub_partition_msdos_iterate (disk=0x101e88e0, hook=0x10146f24 <part_iterate>, hook_data=0x3fffffffe900)
at grub-core/partmap/msdos.c:196
#10 0x000000001014718c in grub_partition_iterate (disk=0x101e88e0, hook=0x10155ccc <scan_disk_partition_iter>, hook_data=0x101e8860)
at grub-core/kern/partition.c:233
#11 0x00000000101560c0 in scan_disk (name=0x101e8860 "hd0", accept_diskfilter=0x1) at grub-core/disk/diskfilter.c:204
#12 0x00000000101591ec in grub_diskfilter_get_pv_from_disk (disk=0x101e8810, vg_out=0x3fffffffea30) at grub-core/disk/diskfilter.c:1173
#13 0x0000000010154f9c in grub_util_get_ldm (disk=0x101e8810, start=0x2800) at grub-core/disk/ldm.c:876
#14 0x0000000010135bb0 in grub_util_biosdisk_get_grub_dev (os_dev=0x101e5fd0 "/dev/sda2") at util/getroot.c:437
#15 0x000000001013531c in grub_util_pull_device (os_dev=0x101e5fd0 "/dev/sda2") at util/getroot.c:111
#16 0x000000001013a6a0 in grub_util_pull_device_os (os_dev=0x101e7520 "/dev/md0", ab=GRUB_DEV_ABSTRACTION_RAID)
at grub-core/osdep/linux/getroot.c:1064
#17 0x0000000010135300 in grub_util_pull_device (os_dev=0x101e7520 "/dev/md0") at util/getroot.c:108
#18 0x0000000010006688 in main (argc=0x3, argv=0x3ffffffff528) at util/grub-install.c:1233
(gdb) frame 6
#6 0x000000001004371c in grub_mdraid_detect (disk=0x101e88e0, id=0x3fffffffe4c8, start_sector=0x3fffffffe4c0)
at grub-core/disk/mdraid1x_linux.c:149
149 if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x),
(gdb) print minor_version
$34 = 0x0
(gdb) print *((*disk)->partition)
$35 = {number = 0x1, start = 0x2800, len = 0x200000, offset = 0x0, index = 0x1, parent = ...

Read more...

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :

------- Comment From <email address hidden> 2017-06-08 13:55 EDT-------
I just ran a fresh installation enabling raid on a 4k block disk and I could not reproduce the problem stated on additional notes "the issue was originally noted at installation-time". Here are the information right after the first boot:

[root@rhel-grub ~]# uname -a
Linux rhel-grub 3.10.0-514.el7.ppc64le #1 SMP Wed Oct 19 11:27:06 EDT 2016 ppc64le ppc64le ppc64le GNU/Linux

[root@rhel-grub ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)

[root@rhel-grub ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md126 9.0G 1.2G 7.9G 13% /
devtmpfs 8.0G 0 8.0G 0% /dev
tmpfs 8.0G 0 8.0G 0% /dev/shm
tmpfs 8.0G 14M 8.0G 1% /run
tmpfs 8.0G 0 8.0G 0% /sys/fs/cgroup
/dev/md127 1018M 145M 874M 15% /boot
tmpfs 1.6G 0 1.6G 0% /run/user/0

[root@rhel-grub ~]# cat /proc/mdstat
md126 : active raid1 sdb1[1] sda2[0]
9423872 blocks super 1.2 [2/2] [UU]
bitmap: 1/1 pages [64KB], 65536KB chunk

md127 : active raid1 sdb2[1] sda3[0]
1048512 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

[root@rhel-grub ~]# grub2-probe --device /dev/md126 --target fs_uuid
5de99add-1cf2-41f0-ba54-c08067e404d4

[root@rhel-grub ~]# grub2-probe --device /dev/md127 --target fs_uuid
d48f8f83-717b-405e-9e7b-02ba37de959a

[root@rhel-grub ~]# parted /dev/sda u s p
Model: QEMU QEMU HARDDISK (scsi)
Disk /dev/sda: 2621440s
Sector size (logical/physical): 4096B/4096B
Partition Table: msdos
Disk Flags:

Number Start End Size Type File system Flags
1 256s 1279s 1024s primary boot, prep
2 1280s 2359295s 2358016s primary raid
3 2359296s 2621439s 262144s primary raid

[root@rhel-grub ~]# parted /dev/sdb u s p
Model: QEMU QEMU HARDDISK (scsi)
Disk /dev/sdb: 2621440s
Sector size (logical/physical): 4096B/4096B
Partition Table: msdos
Disk Flags:

Number Start End Size Type File system Flags
1 256s 2358271s 2358016s primary raid
2 2358272s 2620415s 262144s primary raid

I will do another installation without raid and then migrate it to raid to check if the problem happens.

So, for now, can someone confirm this problem happens during install time?

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :

------- Comment From <email address hidden> 2017-06-12 17:33 EDT-------
As expected, migrating /boot to raid 1 using metadata 1.0 when it is the first partition after prep fails:

[root@rhel-grub2-1 ~]# mdadm -D /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Mon Jun 12 17:22:07 2017
Raid Level : raid1
Array Size : 1048512 (1023.94 MiB 1073.68 MB)
Used Dev Size : 1048512 (1023.94 MiB 1073.68 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Mon Jun 12 17:26:45 2017
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : rhel-grub2-1:0 (local to host rhel-grub2-1)
UUID : 537bfbf4:0b89fb58:f50f14c3:ba5f2bf3
Events : 33

Number Major Minor RaidDevice State
2 253 2 0 active sync /dev/vda2
1 253 17 1 active sync /dev/vdb1

[root@rhel-grub2-1 ~]# parted /dev/vda u s p
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 2621440s
Sector size (logical/physical): 4096B/4096B
Partition Table: msdos
Disk Flags:

Number Start End Size Type File system Flags
1 256s 1279s 1024s primary boot, prep
2 1280s 263423s 262144s primary raid
3 263424s 2360575s 2097152s primary

[root@rhel-grub2-1 ~]# parted /dev/vdb u s p
Model: Virtio Block Device (virtblk)
Disk /dev/vdb: 2621440s
Sector size (logical/physical): 4096B/4096B
Partition Table: msdos
Disk Flags:

Number Start End Size Type File system Flags
1 256s 262399s 262144s primary raid

[root@rhel-grub2-1 ~]# grub2-probe --device /dev/md0 --target fs_uuid
grub2-probe: error: disk ?mduuid/537bfbf40b89fb58f50f14c3ba5f2bf3? not found.

[root@rhel-grub2-1 ~]# grub2-install /dev/vdb
Installing for powerpc-ieee1275 platform.
grub2-install: error: disk ?mduuid/537bfbf40b89fb58f50f14c3ba5f2bf3? not found.

Now, interesting thing is that I was not able to migrate /boot to raid 1 using metadata 1.0 when /boot is not the first partition after prep (just like the installer did on comment #15). When I tried the same as the installer did, grub was not able to find /boot partition after root partition.

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :

------- Comment From <email address hidden> 2017-09-26 08:09 EDT-------
Hi,

I still didn't have time to work on this. I will try to work on this bz this week.
I will let you know when I have updates.

Thanks
Victor

Revision history for this message
In , ccoates (ccoates-redhat-bugs) wrote :

Is there any progress on this?

We've just been hit by the same issue on an E850 during install, getting to the point where we can't install a system using software RAID 1.

As it stands, we're having to install without RAID to get a system up and running...

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :

------- Comment From <email address hidden> 2017-10-27 10:51 EDT-------
(In reply to comment #23)
> Is there any progress on this?
>
> We've just been hit by the same issue on an E850 during install, getting to
> the point where we can't install a system using software RAID 1.
>
> As it stands, we're having to install without RAID to get a system up and
> running...

The install-side issue should already have been addressed for RHEL 7.4 via RH Bug 1184945. The easy workaround is to not use version 1.0 metadata for the RAID config.

Revision history for this message
In , ccoates (ccoates-redhat-bugs) wrote :

(In reply to IBM Bug Proxy from comment #9)
> ------- Comment From <email address hidden> 2017-10-27 10:51 EDT-------
> (In reply to comment #23)
> > Is there any progress on this?
> >
> > We've just been hit by the same issue on an E850 during install, getting to
> > the point where we can't install a system using software RAID 1.
> >
> > As it stands, we're having to install without RAID to get a system up and
> > running...
>
> The install-side issue should already have been addressed for RHEL 7.4 via
> RH Bug 1184945. The easy workaround is to not use version 1.0 metadata for
> the RAID config.

Unfortunately you can't specify a metadata type via kickstart for md devices - so that's still a show-stopper for using RHEL7.3 on an E850.

As a work-around to allow RAID during install, i've had to specify /boot as a btrfs partition, which worked perfectly fine.

Still - this isn't exactly an ideal solution for anyone using an E850 with RHEL 7.3... The customer i'm building out for isn't prepared to use RHEL 7.4 yet.

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :

------- Comment From <email address hidden> 2017-10-27 13:23 EDT-------
Hello Kevin,

The engineer that was in charged of this Bug is leaving IBM.

I am working in this bug as we speak (started this week), and I think I am up to something. I will post my results by the end of the day.

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :
Download full text (3.6 KiB)

------- Comment From <email address hidden> 2017-10-27 16:59 EDT-------
For now, I tried to look inside grub-probe to see if I could find any clues.

Through gdb, I noticed that when using 4k blocksize with --metadata=1.0 on the MD Raid disks, the `dev` variable from util/grub-probe.c +376 is coming out not allocated, so a grub_util_error() is being thrown.

============================
util/grub-probe.c +376
============================
376 dev = grub_device_open (drives_names[0]);
377 if (! dev)
378 grub_util_error ("%s", grub_errmsg);
============================

Now, comparing grub_device_open() code on grub-core/kern/device.c, and using both --metadata=1.0 and --metadata=0.90:

============================
grub-core/kern/device.c +47
============================
47 dev = grub_malloc (sizeof (*dev));
48 if (! dev)
49 goto fail;
50
51 dev->net = NULL;
52 /* Try to open a disk. */
53 dev->disk = grub_disk_open (name);
54 if (dev->disk)
55 return dev;
56 if (grub_net_open && grub_errno == GRUB_ERR_UNKNOWN_DEVICE)
57 {
58 grub_errno = GRUB_ERR_NONE;
59 dev->net = grub_net_open (name);
60 }
61
62 if (dev->net)
63 return dev;
64
65 fail:
66 grub_free (dev);
============================

CURIOSITY: The addresses that came out of the grub_malloc() on line 47 seem a bit odd with 1.0.

=====
RAID using --metadata=1.0: FAILS on grub2-probe
=====
Breakpoint 4, grub_device_open (name=0x10185290 "mduuid/ceebb143b7f740ba41794f2e88b1e1de") at grub-core/kern/device.c:48
48 if (! dev)
(gdb) print *dev
$3 = {
disk = 0x0,
net = 0x3fffb7ed07b8 <main_arena+104>
}

(gdb) print *dev->net
$4 = {
server = 0x3fffb7ed07a8 <main_arena+88> "\240(\034\020",
name = 0x3fffb7ed07a8 <main_arena+88> "\240(\034\020",
protocol = 0x3fffb7ed07b8 <main_arena+104>,
packs = {
first = 0x3fffb7ed07b8 <main_arena+104>,
last = 0x3fffb7ed07c8 <main_arena+120>,
count = 70367534974920
},
offset = 70367534974936,
fs = 0x3fffb7ed07d8 <main_arena+136>,
eof = -1209202712,
stall = 16383
}
=====

=====
RAID using --metadata=0.90: SUCCESS on grub2-probe
=====
Breakpoint 2, grub_device_open (name=0x10185830 "mduuid/1940b3311771bbb17b777c24c48ad94b") at grub-core/kern/device.c:48
48 if (! dev)
(gdb) print *dev
$1 = {
disk = 0x0,
net = 0x10185120
}

(gdb) print *dev->net
$3 = {
server = 0x61 <Address 0x61 out of bounds>,
name = 0x21 <Address 0x21 out of bounds>,
protocol = 0x3fffb7ed07b8 <main_arena+104>,
packs = {
first = 0x3fffb7ed07b8 <main_arena+104>,
last = 0x20,
count = 32
},
offset = 7742648064551382888,
fs = 0x64762f7665642f2f,
eof = 98,
stall = 0
}
=====

Anyhow, this was only an allocation, and on line 51 of grub-core/kern/device.c dev->net receives NULL.

Using --metadata=1.0, `dev` is allocated, and the execution moves into grub_disk_open() on line 53. This function is returning a struct with its value set to zero here. Thus, it will jump the ifs on lines 54, 56 and 62; and eventually fails on 65.

=====
RAID using --metadata=1.0: FAILS on grub2-probe
=====
Breakpoint 2, grub_device_open (name=0x10185290 "mduuid/7266eba408736585cf9c00e3a2342fdc") at grub-core/kern/device.c:54
54 if (dev->disk)
(gdb) print *dev
$3 = {
disk = 0x0,
net = 0x0
}
=====

Whereas using --metadata=0.9...

Read more...

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :
Download full text (8.8 KiB)

------- Comment From <email address hidden> 2017-10-31 09:49 EDT-------
Going deeper in the rabbit hole from IBM Comment 27 / RH Comment 12:

============================
grub-core/kern/disk.c +187
============================
187 grub_disk_t
188 grub_disk_open (const char *name)
189 {
...
224 for (dev = grub_disk_dev_list; dev; dev = dev->next)
225 {
226 if ((dev->open) (raw, disk) == GRUB_ERR_NONE)
227 break;
228 else if (grub_errno == GRUB_ERR_UNKNOWN_DEVICE)
229 grub_errno = GRUB_ERR_NONE;
230 else
231 goto fail;
232 }
233
234 if (! dev)
235 {
236 grub_error (GRUB_ERR_UNKNOWN_DEVICE, N_("disk `%s' not found"),
237 name);
238 goto fail;
239 }
============================

Using --metadata=1.0, `dev` comes zeroed out after the for loop on line 224, whereas on 0.90 it is defined. Moreover, line 236 grub_error() message is the one being printed by grub2-probe.

=====
FAILURE grub2-probe - RAID using --metadata=1.0
=====
Breakpoint 1, grub_disk_open (name=0x10185290 "mduuid/0ef5c3920edae097657894d84aef753d") at grub-core/kern/disk.c:234
234 if (! dev)
(gdb) print dev
$1 = (grub_disk_dev_t) 0x0
(gdb) print *dev
Cannot access memory at address 0x0
(gdb) s
236 grub_error (GRUB_ERR_UNKNOWN_DEVICE, N_("disk `%s' not found"),
=====

=====
SUCCESS grub2-probe - RAID using --metadata=0.90
=====
Breakpoint 1, grub_disk_open (name=0x10185830 "mduuid/ebae38d5105eed037b777c24c48ad94b") at grub-core/kern/disk.c:234
234 if (! dev)
(gdb) print dev
$1 = (grub_disk_dev_t) 0x10165e80 <grub_diskfilter_dev>
(gdb) print *dev
$2 = {
name = 0x10146d50 "diskfilter",
id = GRUB_DISK_DEVICE_DISKFILTER_ID,
iterate = 0x101107cc <grub_diskfilter_iterate>,
open = 0x10110fd8 <grub_diskfilter_open>,
close = 0x10111120 <grub_diskfilter_close>,
read = 0x1011220c <grub_diskfilter_read>,
write = 0x1011227c <grub_diskfilter_write>,
memberlist = 0x10110950 <grub_diskfilter_memberlist>,
raidname = 0x10110df4 <grub_diskfilter_getname>,
next = 0x10165fb8 <grub_procfs_dev>
}
(gdb) s
240 if (disk->log_sector_size > GRUB_DISK_CACHE_BITS + GRUB_DISK_SECTOR_BITS
=====

Since `dev` is used for a couple of devices on grub and this is a C template struct, each dev had its own functions. In our case, we are dealing with grub_disk_dev_t, and through gdb we can see that dev->open() on line 226 actually is grub_diskfilter_open() on:

============================
grub-core/disk/diskfilter.c +419
============================
419 static grub_err_t
420 grub_diskfilter_open (const char *name, grub_disk_t disk)
421 {
422 struct grub_diskfilter_lv *lv;
423
424 if (!is_valid_diskfilter_name (name))
425 return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "unknown DISKFILTER device %s",
426 name);
427
428 lv = find_lv (name);
429
430 if (! lv)
431 {
432 scan_devices (name);
433 if (grub_errno)
434 {
435 grub_print_error ();
436 grub_errno = GRUB_ERR_NONE;
437 }
438 lv = find_lv (name);
439 }
440
441 if (!lv)
442 return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "unknown DISKFILTER device %s",
443 name);
444
445 disk->id = lv->number;
446 disk->data = lv;
447
448 d...

Read more...

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :
Download full text (13.0 KiB)

------- Comment From <email address hidden> 2017-12-11 15:28 EDT-------
Finally had quality time for this bug again. Carrying on:

From breaking grub_diskfilter_vg_register(), we can observe that the disk is being registered differently on 1.0 to 0.90; and that is because the diskfilter that is being registered is for my rhel OS instead of my raid. By doing a backtrace, I also noticed that even the stack is different when calling grub_diskfilter_vg_register():

=====
--- bad 2017-12-07 13:44:39.654222238 -0200
+++ good 2017-12-07 13:43:52.563919187 -0200
@@ -1,36 +1,39 @@
-Breakpoint 1, grub_diskfilter_vg_register (vg=0x10185ea0) at grub-core/disk/diskfilter.c:849
+Breakpoint 1, grub_diskfilter_vg_register (vg=0x10183530) at grub-core/disk/diskfilter.c:849
849 for (lv = vg->lvs; lv; lv = lv->next)
(gdb) print *vg
$4 = {
- uuid = 0x10185ef0 "xZL9PN-dXgE-Vflt-rtI5-Y203-gQ6e-TBS0Mz",
- uuid_len = 38,
- name = 0x10185e80 "rhel",
- extent_size = 8192,
- pvs = 0x10185f20,
- lvs = 0x10185b00,
+ uuid = 0x10183300 "?\300\263\362\242\326]\232\334\316r\364\264\370\267e/en_US.!",
+ uuid_len = 16,
+ name = 0x10183580 "md/1",
+ extent_size = 1,
+ pvs = 0x101850c0,
+ lvs = 0x10183690,
next = 0x0,
driver = 0x0
}
(gdb) bt
-#0 grub_diskfilter_vg_register (vg=0x10185ea0) at grub-core/disk/diskfilter.c:849
-#1 0x0000000010009810 in grub_lvm_detect (disk=0x101827d0, id=0x3ffffffde4e8, start_sector=0x3ffffffde4e0) at grub-core/disk/lvm.c:744
-#2 0x00000000101102b0 in scan_disk_partition_iter (disk=0x101827d0, p=0x3ffffffde568, data=0x101812d0) at grub-core/disk/diskfilter.c:161
-#3 0x0000000010101400 in part_iterate (dsk=0x101827d0, partition=0x3ffffffde680, data=0x3ffffffde920) at grub-core/kern/partition.c:196
-#4 0x00000000101146b8 in grub_partition_msdos_iterate (disk=0x101827d0, hook=0x10101324 <part_iterate>, hook_data=0x3ffffffde920)
+#0 grub_diskfilter_vg_register (vg=0x10183530) at grub-core/disk/diskfilter.c:849

==> +#1 0x0000000010112dd0 in grub_diskfilter_make_raid (uuidlen=16,

+ uuid=0x10183300 "?\300\263\362\242\326]\232\334\316r\364\264\370\267e/en_US.!", nmemb=2, name=0x3ffffffde3e8 "rhel-7.3:1",
+ disk_size=20969216, stripe_size=0, layout=0, level=1) at grub-core/disk/diskfilter.c:1030
+#2 0x000000001000a414 in grub_mdraid_detect (disk=0x101834a0, id=0x3ffffffde588, start_sector=0x3ffffffde580)
+ at grub-core/disk/mdraid1x_linux.c:202
+#3 0x00000000101102b0 in scan_disk_partition_iter (disk=0x101834a0, p=0x3ffffffde608, data=0x10182800) at grub-core/disk/diskfilter.c:161
+#4 0x0000000010101400 in part_iterate (dsk=0x101834a0, partition=0x3ffffffde720, data=0x3ffffffde9c0) at grub-core/kern/partition.c:196
+#5 0x00000000101146b8 in grub_partition_msdos_iterate (disk=0x101834a0, hook=0x10101324 <part_iterate>, hook_data=0x3ffffffde9c0)
at grub-core/partmap/msdos.c:196
-#5 0x000000001010158c in grub_partition_iterate (disk=0x101827d0, hook=0x101100cc <scan_disk_partition_iter>, hook_data=0x101812d0)
+#6 0x000000001010158c in grub_partition_iterate (disk=0x101834a0, hook=0x101100cc <scan_disk_partition_iter>, hook_data=0x10182800)
at grub-core/kern/partition.c:233
-#6 0x00000000101104c0 in scan_disk (name=0...

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :

------- Comment From <email address hidden> 2018-01-16 14:09 EDT-------
Just for the record, I have also reproduced this bug in x86_64 with ubuntu 17.10.

Revision history for this message
In , pjones (pjones-redhat-bugs) wrote :

This works for me using grub2-2.02-0.65.el7_4.2 on an EFI machine with a 4k disk:

[root@pjones3 tmp]# blockdev --getbsz /dev/sdb
4096
[root@pjones3 tmp]# blockdev --getbsz /dev/sdb2
4096
[root@pjones3 tmp]# blockdev --getbsz /dev/md0
4096
[root@pjones3 tmp]# ./usr/sbin/grub2-probe --target fs_uuid -d /dev/sdb2
c1b85a71-972d-4b69-84cc-e6a05326a4c8
[root@pjones3 tmp]# ./usr/sbin/grub2-probe --target fs_uuid -d /dev/md0
c1b85a71-972d-4b69-84cc-e6a05326a4c8

Note the detection in mdraid1x_linux.c still isn't right, because it's trying to find the raid superblock based on the location of the superblock data based on the size of /dev/sdb rather than /dev/sdb2, but grub2-probe and booting the machine with /boot on this raid are both successful.

I see this was reported with grub2-2.02-0.44.el7 ; does the newer package work for you?

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :

------- Comment From <email address hidden> 2018-06-13 15:24 EDT-------
Hi, I'm still getting the result:

./grub-probe: error: disk `mduuid/b184ce73be4a91ec1b586dcce8ee7f9b' not found.

One thing that I noticed is that we have some sector lengths hardcoded for 512 bytes. Yet, it seems that grub is facing some problems when trying to find magic number for 1.0 metadata.

I did dump the variables returned by the disk read when the mdraid1x_linux.c tries to find the magic number and its getting the wrong position.

When I finally changed the hardcoded sector lenghts for 4k instead, the mdraid1x_linux.c was able to find the magic number, although it wasn't able to successful find the disk yet.

I'm still investigating this problem and hope I find something in a couple of days.

Thank you

Revision history for this message
In , hannsj_uhl (hannsjuhl-redhat-bugs) wrote :

ok ... with no news for this bugzilla since exactly one year
I am closing this Red Hat bugzilla now
and please reopen if required with then using the current RHEL7.7 ...
... thanks for your support ...

Changed in grub2 (CentOS):
importance: Unknown → High
status: Unknown → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.