ucf and update-grub semantics are incompatible
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
grub (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: grub
Ubuntu hardy (amd64)
grub 0.97-29ubuntu21
The Ubuntu-specific modification to use ucf for managing menu.lst breaks the traditional update-grub semantics in fundamental ways. If I understand the architecture correctly, I'm not sure how using ucf could ever support the normal workflow that's possible using the Debian update-grub.
ucf's goal is to preserve local changes to a configuration file, but its notion of local changes is contrary to how update-grub is supposed to work. Suppose that you're a large organization with a lot of systems with a variety of kernel revisions, for one reason or another, but with a desire to have uniform kernel parameters like console speed and timeouts. With update-grub, you can just copy your menu.lst template to all of your systems and run update-grub afterwards, and update-grub will apply your new defaults and generate a new kernel list using them. However, with Ubuntu's update-grub, it generates the initial menu.lst and kernel list, and then when you replace that with a fresh template, ucf sees that you've removed the kernel list. It thinks that's a local modification, and when update-grub generates a new kernel list, it "preserves" your modification and leaves the kernel list out of the new menu.lst. Tada, instant unbootable system.
Even worse, this bug is sticky: you can re-run update-grub all you want, it succeeds without errors, and it keeps generating an empty menu.lst. This is a completely mystifying error until you dig into the details of what's going on. To recover, you have to run ucf --purge /var/run/
The specific problem I had was that we were copying our template over and re-running update-grub in an FAI post-install script, resulting in a system that always had an empty kernel menu and wouldn't boot. Once I finally figured out what's going on, I worked around it by being very careful to put our template in place before the initial system bootstrap and then never changing menu.lst afterwards, but this is a bad limitation for a large site that scales system administrators by automating configuration file updates.
It's possible that the right solution here is to change to a completely different configuration that uses a separate file for the template and explicitly supports custom kernel menu entries rather than reverting to the Debian behavior, but the current solution in Ubuntu breaks capabilities of update-grub that we expected to be able to use. Reverting to the Debian behavior, which we've never had any trouble with, would be an improvement over the current Ubuntu version for us.
I just want to add a me too for this bug. I'm not using any custom kernels, but I just have a noninteractive script that does my software updates, so I set UCF_FORCE_ CONFFOLD= YES in the script and this causes nothing but problems for grub menu.lst configuration.
Thanks to this bug report, it does seem that I can do this to workaround the problem in my script: CONFFNEW= YES grub/menu. lst
unset UCF_FORCE_CONFFOLD
export UCF_FORCE_
ucf --purge /var/run/
update-grub
I have spent hours trying to figure out why update-grub would say it found new kernels, yet, never update the menu.lst. There should at least be an error message of some sort if the menu.lst didn't actually get updated with the kernels it said it found. For example, without purging /var/run/ grub/menu. lst here is the output of update-grub with no UCF environment variables set:
Searching for GRUB installation directory ... found: /boot/grub 2.6.24- 21-generic 2.6.24- 19-generic +.bin
Searching for default file ... found: /boot/grub/default
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz-
Found kernel: /boot/vmlinuz-
Found kernel: /boot/memtest86
Updating /boot/grub/menu.lst ... done
At this point menu.lst does not have the 2.6.24-21-generic kernel. This kernel was installed with my noninteractive script, so UCF_FORCE_ CONFFOLD= YES would have been set when it was initially installed.