Comment 12 for bug 1817713

Revision history for this message
In , bugproxy (bugproxy-redhat-bugs) wrote :

------- Comment From <email address hidden> 2017-10-27 16:59 EDT-------
For now, I tried to look inside grub-probe to see if I could find any clues.

Through gdb, I noticed that when using 4k blocksize with --metadata=1.0 on the MD Raid disks, the `dev` variable from util/grub-probe.c +376 is coming out not allocated, so a grub_util_error() is being thrown.

============================
util/grub-probe.c +376
============================
376 dev = grub_device_open (drives_names[0]);
377 if (! dev)
378 grub_util_error ("%s", grub_errmsg);
============================

Now, comparing grub_device_open() code on grub-core/kern/device.c, and using both --metadata=1.0 and --metadata=0.90:

============================
grub-core/kern/device.c +47
============================
47 dev = grub_malloc (sizeof (*dev));
48 if (! dev)
49 goto fail;
50
51 dev->net = NULL;
52 /* Try to open a disk. */
53 dev->disk = grub_disk_open (name);
54 if (dev->disk)
55 return dev;
56 if (grub_net_open && grub_errno == GRUB_ERR_UNKNOWN_DEVICE)
57 {
58 grub_errno = GRUB_ERR_NONE;
59 dev->net = grub_net_open (name);
60 }
61
62 if (dev->net)
63 return dev;
64
65 fail:
66 grub_free (dev);
============================

CURIOSITY: The addresses that came out of the grub_malloc() on line 47 seem a bit odd with 1.0.

=====
RAID using --metadata=1.0: FAILS on grub2-probe
=====
Breakpoint 4, grub_device_open (name=0x10185290 "mduuid/ceebb143b7f740ba41794f2e88b1e1de") at grub-core/kern/device.c:48
48 if (! dev)
(gdb) print *dev
$3 = {
disk = 0x0,
net = 0x3fffb7ed07b8 <main_arena+104>
}

(gdb) print *dev->net
$4 = {
server = 0x3fffb7ed07a8 <main_arena+88> "\240(\034\020",
name = 0x3fffb7ed07a8 <main_arena+88> "\240(\034\020",
protocol = 0x3fffb7ed07b8 <main_arena+104>,
packs = {
first = 0x3fffb7ed07b8 <main_arena+104>,
last = 0x3fffb7ed07c8 <main_arena+120>,
count = 70367534974920
},
offset = 70367534974936,
fs = 0x3fffb7ed07d8 <main_arena+136>,
eof = -1209202712,
stall = 16383
}
=====

=====
RAID using --metadata=0.90: SUCCESS on grub2-probe
=====
Breakpoint 2, grub_device_open (name=0x10185830 "mduuid/1940b3311771bbb17b777c24c48ad94b") at grub-core/kern/device.c:48
48 if (! dev)
(gdb) print *dev
$1 = {
disk = 0x0,
net = 0x10185120
}

(gdb) print *dev->net
$3 = {
server = 0x61 <Address 0x61 out of bounds>,
name = 0x21 <Address 0x21 out of bounds>,
protocol = 0x3fffb7ed07b8 <main_arena+104>,
packs = {
first = 0x3fffb7ed07b8 <main_arena+104>,
last = 0x20,
count = 32
},
offset = 7742648064551382888,
fs = 0x64762f7665642f2f,
eof = 98,
stall = 0
}
=====

Anyhow, this was only an allocation, and on line 51 of grub-core/kern/device.c dev->net receives NULL.

Using --metadata=1.0, `dev` is allocated, and the execution moves into grub_disk_open() on line 53. This function is returning a struct with its value set to zero here. Thus, it will jump the ifs on lines 54, 56 and 62; and eventually fails on 65.

=====
RAID using --metadata=1.0: FAILS on grub2-probe
=====
Breakpoint 2, grub_device_open (name=0x10185290 "mduuid/7266eba408736585cf9c00e3a2342fdc") at grub-core/kern/device.c:54
54 if (dev->disk)
(gdb) print *dev
$3 = {
disk = 0x0,
net = 0x0
}
=====

Whereas using --metadata=0.90, the struct is not zeroed out, and grub_device_open() returns `dev` on line 54.

=====
RAID using --metadata=0.90: SUCCESS on grub2-probe
=====
Breakpoint 1, grub_device_open (name=0x10185830 "mduuid/a29da500c684c0d47b777c24c48ad94b") at grub-core/kern/device.c:54
54 if (dev->disk)
(gdb) print *dev
$1 = {
disk = 0x101834a0,
net = 0x0
}
=====

Riddle me this: why?

More grub to come ... will look into grub_disk_open() and on mdadm next.