Comment 19 for bug 222421

Revision history for this message
Paul Taylor (paul-taylor-london) wrote :

Thanks to Steve Langasek for his thinking allowed about this important bug.

In his faulty sequence, and on the occasion that the bug hit me, the crucial failure is that
** there is kernel entry in menu.lst with no accompanying initrd **
(in particular, the default entry had this).

Looking at the code in /usr/sbin/update-grub, I see that write_kernel_entry() and the loop
in output_kernel_list() where it is called seem to have been written for the explicit possibility
that initrd does not exist.

First, is this a legitimate possibility? Even if it is, could it be eliminated by generating a dummy
initramfs, even where it is not needed, or a file with the appropriate name but some cipher as
its content to indicate to write_kernel_entry() that initramfs is INTENTIONALLY absent?

What I propose is that update-grub should ONLY generate a kernel entry in menu.lst when
the kernel and initramfs actually exist. (Maybe it could also call some program to verify
that they are intact.) Failing this, it could print an error message to STDERR and leave a
comment in menu.lst concerning the defective kernel.

Bugs are like problems with one's house: some are like torn wallpaper, which can be left until
you have both the time and the degree of irritation fix them, whilst others are like rainwater
dripping on to your bed from a broken rooftile, which have to be fixed straight away.

Now, I use my computer for email and wordprocessing, and wouldn't bother to make much
fuss about bugs in packages. But, as I said above, this one left my machine in an unusable
state, that I could only fix because I had had a bit of experience as a sysadmin. This is like
water coming through the roof, and it should have the highest classification in launchpad.

So I consider that there should be a belt-and-braces solution in this case. Ubiquity has
overall responsibility for installation, and that should double-check, after the other programs
have done their job, whether the system is safe and will be able to boot. If not, it should
scream loudly to the user, try to fix the problem, and give copious debugging information,
including the address of launchpad.

I am not actually sure that Steve's sequence is correct, because in my case there was a .bak
version of the initramfs (and not the properly named one). I think that my logfile above
shows that mkinitramfs failed. I don't know why the .bak file existed, but it worked when
I renamed it and rewrote menu.lst manually. Maybe mkinitramfs created the image correctly,
with the .bak name, but failed just before it changed the name.

I think that my logfile above shows that initramfs failed because of a lack of memory. When
the next kernel came along, I was unable to install it for exactly this reason, and apt-get
got into a confused state in which it would neither install nor remove packages. This forced
me to buy some more RAM. (I also had far too small a partition for /, and later fixed this too.)
However, this problem has still not competely gone away - the machine hangs whenever
I run any of the Ubuntu GUI pacakage management tools, and so I jsut use apt-get from the
terminal instead.

But, once again, even if my hardware falls below the recommended spec, this is no excuse
for leaving my machine in an unusable state without any explanation of what has really gone
wrong. Presumably it is part of the Ubuntu philosophy that people without the money to
buy the latest and sexiest computers still have the right to use a decent operating system.

Finally, I noticed from a search of lauchpad that there have actually been lots of reports of
this bug, dating back to 2006.