zfs on root fails with grub syntax error with multidisks pools
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| grub2 (Ubuntu) |
Medium
|
Jean-Baptiste Lallement | ||
| Focal |
Medium
|
Jean-Baptiste Lallement | ||
| grubzfs-testsuite (Ubuntu) |
Medium
|
Jean-Baptiste Lallement | ||
| Focal |
Medium
|
Jean-Baptiste Lallement |
Bug Description
[Description]
When a pool is created on several devices like cache and log on separate devices, mirror or raidz. When grub-probe queries the target to report the device attached to the boot directory, it reports all the devices that make the pool, one by line. The result is an error in 10_linux_zfs that generates an invalid grub configuration file. Only the case 1 pool = 1 device was considered
[Test Case]
1. Create a mirrored pool
$ zpool create mirror /dev/Xda1 /dev/Xda2
2. run update-grub
Expected result:
grub configuration is generated successfully.
Actual result:
The generated grub configuration file is incomplete and its syntax is invalid
[Regression Potential]
Low. The patch takes the first device returned by grub-probe which is the first device of data of the mirror.
[Original Description]
At the end of the upgrade from 19.04 to 19.10, the post process of the update-grub reports:
Syntax error at line 185
Syntax errors are detected in generated GRUB config file.
Ensure that there are no errors in /etc/default/grub
and /etc/grub.d/* files or please file a bug report with
/boot/grub/
run-parts: /etc/kernel/
dpkg: error processing package linux-image-
installed linux-image-
Errors were encountered while processing:
friendly-recovery
grub-efi-amd64
grub-efi
grub-efi-
shim-signed
linux-
The system used https:/
The syntax error in grub.cfg.new is an extra } on line 185. However, comparing the grub.cfg.new to the previously generated grub.cfg under 19.04, there is a significant quantity of configuration missing.
Manually running update-grub generates the same error. /etc/default/grub is the only file changed from default installation to include zswap. This file was not changed prior to upgrade.
The error is reported during the processing of /etc/grub.
jpb (jbrown10) wrote : | #1 |
jpb (jbrown10) wrote : | #2 |
jpb (jbrown10) wrote : | #3 |
jpb (jbrown10) wrote : | #4 |
jpb (jbrown10) wrote : | #5 |
jpb (jbrown10) wrote : | #6 |
jpb (jbrown10) wrote : | #7 |
One more piece of information. I have 3 zfs pools defined:
rpool for my root
dpool for my data (virtualization and other)
bpool for my external usb backup device
So, my bpool is different than the 19.10 new bpool to support the boot process. I don't know if this is a problem, but thought I'd mention it since it duplicates a pool name.
jpb (jbrown10) wrote : | #8 |
I modified 10_linux_zfs to comment out the set -e and add the variables for pkgdatadir and the GRUB* variables from /etc/default/grub and ran it to see the output.
Notice at the end the double brace, lack of initrd value and linux image even though it was found at the beginning when it correctly identified the root dataset.
jpb@explorer:
Found linux image: vmlinuz-
Found initrd image: initrd.
Found linux image: vmlinuz-
Found initrd image: initrd.
Found linux image: vmlinuz-
Found initrd image: initrd.
Found linux image: vmlinuz-
Found initrd image: initrd.
Found linux image: vmlinuz-
Found initrd image: initrd.
********** removed a ton of snapshots that were listed which found the linux image and initrd image. *****************
Found linux image: vmlinuz-
Found initrd image: initrd.
function gfxmode {
set gfxpayload="${1}"
if [ "${1}" = "keep" ]; then
set vt_handoff=
else
set vt_handoff=
fi
}
if [ "${recordfail}" != 1 ]; then
if [ -e ${prefix}
if hwmatch ${prefix}
if [ ${match} = 0 ]; then
set linux_gfx_mode=keep
else
set linux_gfx_mode=text
fi
else
set linux_gfx_mode=text
fi
else
set linux_gfx_mode=keep
fi
else
set linux_gfx_mode=text
fi
export linux_gfx_mode
menuentry 'Ubuntu 19.10' --class ubuntu --class gnu-linux --class gnu --class os ${menuentry_
recordfail
load_video
gfxmode ${linux_gfx_mode}
insmod gzio
if [ "${grub_platform}" = xen ]; then insmod xzio; insmod lzopio; fi
insmod part_gpt
insmod zfs
set root='hd2,gpt1'
if [ x$feature_
search --no-floppy --fs-uuid --set=root --hint-
else
search --no-floppy --fs-uuid --set=root ba04856b80ac4244
fi
linux root=ZFS=
initrd
}
}
jpb (jbrown10) wrote : | #9 |
zsys is currently not installed.
jpb (jbrown10) wrote : | #10 |
I'm getting closer to finding the problem. It looks like in function get_dataset_info() in 10_linux_zfs the content of initrd_list, kernel_list, and last_booted_kernel are not being echoed back on the call. I don't know why yet. I haven't fully grasp the shell script.
zoolook (nbensa) wrote : | #11 |
Is your rpool mirrored?
On a test enviroment if I detach one of the drives, I can run update-grub. Reattaching the drive breaks update-grub.
I'm currently stuck in 19.04 because of this bug.
# zpool status
pool: zroot
state: ONLINE
scan: scrub repaired 0B in 2h59m with 0 errors on Sun Oct 13 03:23:12 2019
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda3 ONLINE 0 0 0
sdb3 ONLINE 0 0 0
jpb (jbrown10) wrote : | #12 |
Yes, my rpool is mirrored.
pool: rpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 0 days 05:45:25 with 0 errors on Sun Oct 13 06:09:31 2019
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-
wwn-
logs
nvme-
cache
nvme0n1p4 ONLINE 0 0 0
errors: No known data errors
In 10_linux_zfs, there is an if statement that echos data:
if [ -n "${initrd_list}" -a -n "${kernel_list}" ]; then
echo "${dataset}
else
grub_warn "didn't find any valid initrd or kernel."
fi
The problem is that execution time, the last 3 parameters are dropped even though they have data.
+ [ -n /ROOT/ubuntu@
+ echo rpool/ROOT/
You can see from the expanded debug of the run (set -xv) that both initrd_list and kernel_list had data that included the kernel and initrd but within the if, it gets dropped. That is causing the generated grub.cfg.new to be missing the kernel, initrd, and the advanced submenu items that would normally be generated.
Launchpad Janitor (janitor) wrote : | #13 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in grub2 (Ubuntu): | |
status: | New → Confirmed |
jpb (jbrown10) wrote : | #14 |
My current work around is to manually maintain the grub.cfg in /boot/grub. Using the version that was generated under 19.04, I have simply updated the kernel and initrd to reflect the new modules.
This still means that any package I install that needs to regenerate the grub.cfg will fail. So far it is only related to cleanup of the kernel and I expect the same will be true with the add of new kernels.
Not ideal but I haven't found the root of the problem with the 10_linux_zfs script.
Rex Tsai (chihchun) wrote : | #15 |
I believe the invalid grub menu is caused by wrong "}", when there is only a main section.
jpb (jbrown10) wrote : Re: [Bug 1848856] Re: Upgrade from 19.04 to 19.10 with zfs on root fails with grub syntax error | #17 |
Hi Rex - you essentially removed the line that set at_least_
Yes, that removes an extra "}" but if you look at the generated script, the
linux line specifies no kernel image and the initrd line includes no file.
This will result in an unbootable system.
linux root=ZFS=
intel_iommu=on iommu=pt rootdelay=3 zswap.enabled=1 zswap.compresso
zswap.zpool=z3fold ${vt_handoff}
initrd
}
So, I believe the problem is deeper than a simple syntax error, there is
content missing. Additionally, when you look at the debug output you'll
see there were other kernels installed which should have gone under a
submenu. Content is being truncated or not carried through. This might be
a dash problem -- I can't tell.
But to remove the extra "}" does not resolve the issue.
Thanks,
Jeff
On Sun, Oct 20, 2019 at 9:35 AM Rex Tsai <email address hidden> wrote:
> I believe the invalid grub menu is caused by wrong "}", when there is
> only a main section.
>
> ** Patch added: "1848856.debdiff"
>
> https:/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Upgrade from 19.04 to 19.10 with zfs on root fails with grub syntax
> error
>
> Status in grub2 package in Ubuntu:
> Confirmed
>
> Bug description:
> At the end of the upgrade from 19.04 to 19.10, the post process of the
> update-grub reports:
>
> Syntax error at line 185
> Syntax errors are detected in generated GRUB config file.
> Ensure that there are no errors in /etc/default/grub
> and /etc/grub.d/* files or please file a bug report with
> /boot/grub/
> run-parts: /etc/kernel/
> 1
> dpkg: error processing package linux-image-
> (--configure):
> installed linux-image-
> subprocess returned error exit status 1
> Errors were encountered while processing:
> friendly-recovery
> grub-efi-amd64
> grub-efi
> grub-efi-
> shim-signed
> linux-image-
>
> The system used https:/
> -Root-on-ZFS to add zfs on root to a 19.04 system.
>
> The syntax error in grub.cfg.new is an extra } on line 185. However,
> comparing the grub.cfg.new to the previously generated grub.cfg under
> 19.04, there is a significant quantity of configuration missing.
>
> Manually running update-grub generates the same error.
> /etc/default/grub is the only file changed from default installation
> to include zswap. This file was not changed prior to upgrade.
>
> The error is reported during the processing of
> /etc/grub.
> upgrade on 10/18 and have done multiple updates to get the latest
> kernel and remove old kernels prior to the upgrade. So, I believe the
> problem is with one of the upgrade modules.
>
> To manage notifications about this bug go to...
Rex Tsai (chihchun) wrote : | #16 |
I have a similar setup, a ssd as a cache for the root partition. The problem is grub-probe list multiple disks for the boot folder, which caused the /etc/grub.
The 10_linux_zfs is expecting one disk, but the following commands will return several lines.
/usr/sbin/
jpb (jbrown10) wrote : Re: Upgrade from 19.04 to 19.10 with zfs on root fails with grub syntax error | #18 |
True, grub-probe will produce several lines. On my system it lists my drives as my actual mirrors, followed by my logs and then the cache -- same order as zpool status rpool
grub-probe --target=device /boot
/dev/sdc1
/dev/sda1
/dev/nvme0n1p2
/dev/nvme0n1p4
I'm not seeing in 10_linux_zfs where having multiple target devices would cause the dropping of the kernel and initrd from what was being generated. It looks to me like data is missing -- lost during the echo action of get_dataset_info().
The attachment "1848856.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]
tags: | added: patch |
tags: | added: rls-ff-incoming |
jpb (jbrown10) wrote : | #20 |
Problem identified. The variable intrd_device contains a \r at the end of the variable. When concatenating the variable with the subsequent variables, it effectively truncates the subsequent data in the echo return.
Preceding the if statement, I inserted the following:
# remove the trailing \r from the variable
initrd_
This removed the trailing \r, allowing the full set of data to be returned.
I tested this fix on my own system by modifying the 10_linux_zfs script in /etc/grub.d. The grub.cfg is now successfully generated and apt/dpkg are able to perform an update-grub.
10_linux_zfs attached.
jpb (jbrown10) wrote : | #21 |
zoolook (nbensa) wrote : | #22 |
I tested the fix propossed by jpb in #21 and it works for me.
Didier Roche (didrocks) wrote : | #23 |
Thanks for digging into this and finding your root cause!
I'm really wondering what is causing this additional \r on that variable compared to a standard installation, with multiple kernels and initrds, which doesn't get this \r. Do you have any specific grub configuration in /etc/default/grub* or anything in /boot which can lead to this? Could be mirror-related, where grub_probe --target device returns this \r. What's the value of $initrd_device?
The reason is that I would like to add this case to the testsuite to ensure we don't regress it in the future and so, trying to find the root cause (we can mock grub_probe in our testsuite to return what you got exactly).
jpb (jbrown10) wrote : | #24 |
The \r is an invisible return character, commonly returned by an echo. However, here, we are executing grub_probe --target=device /boot
The code in 10_linux_zfs:
initrd_
The results are the same as above. For the specific entry, it is the first drive in my mirror:
/dev/sdc1 (first drive as defined in my mirror)
/dev/sda1 (second drive as defined in my mirror)
/dev/nvme0n1p2 (zfs logs)
/dev/nvme0n1p4 (zfs cache)
Notice that the results of the output from grub_probe has new lines. New lines are usually \n but may be \r\n. I don't know what grub_probe is doing and didn't look to see how it does it. But the return from there would be what's got the \r (again it isn't visible).
I have nothing special or different from the stock Ubuntu /etc/default/grub other than I added zswap entries to the /etc/default/grub and those don't affect this.
The underlying cause may be that because I have a mirrored root using zfs mirroring, multiple devices show up and may cause the \r to be returned. It's just a guess.
So, to test, you'd want grub_probe to return multiple devices and the easiest way would be to just have a mirror for the rpool.
Changed in grub2 (Ubuntu): | |
importance: | Undecided → Medium |
Changed in grub2 (Ubuntu): | |
status: | Confirmed → Triaged |
assignee: | nobody → Jean-Baptiste Lallement (jibel) |
Jean-Baptiste Lallement (jibel) wrote : | #25 |
I'll strip the \r as a safety measure but I cannot reproduce this issue.
I created a mirror on 2 devices with log and cache devices on separate partitions of a third disk like your setup and the line separator is always a new line character (0x0a) without carriage return (0x0d).
Jean-Baptiste Lallement (jibel) wrote : | #26 |
Actually I think '\r' is a red herring. The problem is that if several devices are used for a pool (raid, log or cache) grub-probe returns several lines. Several are not handled properly when we generate the line of metadata used to create the menu. The patch is to take the first device returned by grub-probe.
jpb (jbrown10) wrote : | #27 |
Agreed. It is a multi-line problem (which by its nature includes a '\r') not being handled. Happy to test the solution on my system.
Changed in grub2 (Ubuntu Focal): | |
status: | Triaged → Fix Released |
Changed in grub2 (Ubuntu Focal): | |
status: | Fix Released → Triaged |
status: | Triaged → In Progress |
summary: |
- Upgrade from 19.04 to 19.10 with zfs on root fails with grub syntax - error + zfs on root fails with grub syntax error with multidisk pools |
Changed in grub2 (Ubuntu Eoan): | |
importance: | Undecided → Medium |
Changed in grubzfs-testsuite (Ubuntu Eoan): | |
importance: | Undecided → Medium |
Changed in grubzfs-testsuite (Ubuntu Focal): | |
importance: | Undecided → Medium |
Changed in grub2 (Ubuntu Eoan): | |
assignee: | nobody → Jean-Baptiste Lallement (jibel) |
Changed in grubzfs-testsuite (Ubuntu Eoan): | |
assignee: | nobody → Jean-Baptiste Lallement (jibel) |
Changed in grubzfs-testsuite (Ubuntu Focal): | |
assignee: | nobody → Jean-Baptiste Lallement (jibel) |
Changed in grub2 (Ubuntu Eoan): | |
status: | New → Triaged |
Changed in grubzfs-testsuite (Ubuntu Eoan): | |
status: | New → Triaged |
Changed in grubzfs-testsuite (Ubuntu Focal): | |
status: | New → Triaged |
summary: |
- zfs on root fails with grub syntax error with multidisk pools + zfs on root fails with grub syntax error with multidisks pools |
description: | updated |
Darrin Burger (aenigma1372) wrote : | #28 |
I also tested the fix propossed by jpb in #21 and can confirm that it worked for me as well.
Launchpad Janitor (janitor) wrote : | #29 |
This bug was fixed in the package grubzfs-testsuite - 0.4.6
---------------
grubzfs-testsuite (0.4.6) focal; urgency=medium
[ Jean-Baptiste Lallement ]
[ Didier Roche ]
* Test cases for:
- Handle the case where grub-probe returns several devices for a single
pool (LP: #1848856).
- Do not crash on invalid fstab and report the invalid entry.
(LP: #1849347)
- When a pool fails to import, catch and display the error message and
continue with other pools. Import all the pools in readonly mode so we
can import other pools with unsupported features (LP: #1848399)
-- Jean-Baptiste Lallement <email address hidden> Mon, 18 Nov 2019 11:38:20 +0100
Changed in grubzfs-testsuite (Ubuntu Focal): | |
status: | Triaged → Fix Released |
Launchpad Janitor (janitor) wrote : | #30 |
This bug was fixed in the package grub2 - 2.04-1ubuntu14
---------------
grub2 (2.04-1ubuntu14) focal; urgency=medium
* debian/
- Handle the case where grub-probe returns several devices for a single
pool (LP: #1848856). Thanks jpb for the report and the proposed patch.
- Add savedefault to non-recovery entries (LP: #1850202). Thanks Deltik
for the patch.
- Do not crash on invalid fstab and report the invalid entry.
(LP: #1849347) Thanks Deltik for the patch.
- When a pool fails to import, catch and display the error message and
continue with other pools. Import all the pools in readonly mode so we
can import other pools with unsupported features (LP: #1848399) Thanks
satmandu for the investigation and the proposed patch
-- Jean-Baptiste Lallement <email address hidden> Mon, 18 Nov 2019 11:22:43 +0100
Changed in grub2 (Ubuntu Focal): | |
status: | In Progress → Fix Released |
Jean-Baptiste Lallement (jibel) wrote : | #31 |
debdiff to backport the fixes from focal to eoan
Jean-Baptiste Lallement (jibel) wrote : | #33 |
zoolook (nbensa) wrote : | #34 |
What is needed for a fix for 19.10?
isden (isden) wrote : | #35 |
> What is needed for a fix for 19.10?
The patch to 10_linux_zfs from "grub2_
zoolook (nbensa) wrote : | #36 |
I know it works. jpb propossed the first patch, I tested (#22)
But I'm waiting for is an "official" fix :-)
Brian Elliott Finley (finley) wrote : | #37 |
On Ubuntu 19.10, with ZFS root and zfs-auto-snapshot, /etc/grub.
This is a simple patch that just exits the get_dataset_info() function if the dataset is a snapshot. I could be missing something, but I don't expect we want to be finding kernel & initrd images on snapshots under normal circumstances.
Jean-Baptiste Lallement (jibel) wrote : | #38 |
@finley, this is a different issue. It'll be easier to track in a separate report. Could you file on please?
Thanks.
Jonathan Sailor (jsailor) wrote : | #39 |
@jibel, any chance of an SRU for this?
Simon Quigley (tsimonq2) wrote : | #40 |
19.10 is EOL, unsubscribed sponsors and removed the series.
Please resubscribe ~ubuntu-sponsors if a patch is still needed.
no longer affects: | grub2 (Ubuntu Eoan) |
no longer affects: | grubzfs-testsuite (Ubuntu Eoan) |
piersh (piersh) wrote : | #41 |
this problem still exists in 20.04
Looking closer at the 19.04 grub.cfg and the post19.10 grub.cfg.new I see that the 19.04 version of update-grub was using 10_linux instead of 10_linux_zfs to generate the grub.cfg.new.