Since a few weeks update-grub was no longer detecting my root-on-zfs install. This system does not follow the bpool/rpool logic as it got installed a long time ago.
I was able to pinpoint the issue to this part of the 10_linux_zfs script:
if [ -n "$(ls ${candidate_path} 2>/dev/null)" ]; then
echo "${candidate_path}"
return
fi
This code seems to identify candidate locations for /boot directories, and expects them to be empty. ZFS does not require /boot to be empty to mount the target boot dataset on it (overlay=on), but I assume this way some other candidate paths can easily be skipped quickly.
When grubenv gets created on the zfs root dataset during a failed boot sequence, 10_linux_zfs will skip it from then on, leaving grub.cfg without valid boot entries.
I was able to reproduce the issue by creating and removing /boot/grub on my root dataset, and fix it by allowing the 10_linux_zfs script to continue if only grub exists for candidate_path:
if [ -n "$(ls ${candidate_path} 2>/dev/null)" ] && [ "$(ls ${candidate_path} 2>/dev/null)" != "grub" ]; then
echo "${candidate_path}"
return
fi
Which fixes it for me.
Even when the grub-initrd-fallback.service bug gets addressed, I would allow for /boot/grub to exist on the root dataset. If not, that bug becomes quite critical so it seems.
Since a few weeks update-grub was no longer detecting my root-on-zfs install. This system does not follow the bpool/rpool logic as it got installed a long time ago.
I was able to pinpoint the issue to this part of the 10_linux_zfs script:
if [ -n "$(ls ${candidate_path} 2>/dev/null)" ]; then
echo "${candidate_path}"
return
fi
This code seems to identify candidate locations for /boot directories, and expects them to be empty. ZFS does not require /boot to be empty to mount the target boot dataset on it (overlay=on), but I assume this way some other candidate paths can easily be skipped quickly.
So I think I basically ran into this issue: https:/ /bugs.launchpad .net/ubuntu/ +source/ grub2/+ bug/1881442.
When grubenv gets created on the zfs root dataset during a failed boot sequence, 10_linux_zfs will skip it from then on, leaving grub.cfg without valid boot entries.
I was able to reproduce the issue by creating and removing /boot/grub on my root dataset, and fix it by allowing the 10_linux_zfs script to continue if only grub exists for candidate_path:
if [ -n "$(ls ${candidate_path} 2>/dev/null)" ] && [ "$(ls ${candidate_path} 2>/dev/null)" != "grub" ]; then
echo "${candidate_path}"
return
fi
Which fixes it for me.
Even when the grub-initrd- fallback. service bug gets addressed, I would allow for /boot/grub to exist on the root dataset. If not, that bug becomes quite critical so it seems.