GRUB's menu.lst modified in wrong way -> Error 15 File not found on next reboot

Bug #61108 reported by vachun on 2006-09-18
24
Affects Status Importance Assigned to Milestone
grub (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: linux-image-2.6.15-26-amd64-generic

When linux is not on first partition on first hd, it doesn't boot after this kernel update ( linux-image-2.6.15-26-amd64-generic_2.6.15-26.47_amd64.deb ) because all grub entries are 'patched' to point to (hd0,0) and /dev/hda1

It was verified on 2 machines:
1) original menu lst entry

title Ubuntu, kernel 2.6.15-26-amd64-generic
root (hd0,1)
kernel /boot/vmlinuz-2.6.15-26-amd64-generic root=/dev/sda2 ro quiet splash
initrd /boot/initrd.img-2.6.15-26-amd64-generic
savedefault
boot

after upgrade:

title Ubuntu, kernel 2.6.15-26-amd64-generic
root (hd0,0)
kernel /boot/vmlinuz-2.6.15-26-amd64-generic root=/dev/hda1 ro quiet splash
initrd /boot/initrd.img-2.6.15-26-amd64-generic
savedefault
boot

the second machine has (hd0,5) replaced with (hd0,0) and /dev/hda6 with /dev/hda1

It was nightmare, because it lock out one of our customer's server Friday evening and we should make 300km trip in order to fix things up :-(

Ben Collins (ben-collins) wrote :

The kernel doesn't modify the menu.lst, it only calls update-grub.

Ben Collins wrote:
> The kernel doesn't modify the menu.lst, it only calls update-grub.
>
> ** Changed in: linux-source-2.6.15 (Ubuntu)
> Sourcepackagename: linux-source-2.6.15 => grub
>
>
It is clear, that kernel itself doesn't touch menu.lst , but some
postinstall script do it for sure

--
Jan Vachun

Spintec s.r.l.
C.so Torino 89/A
Ferriera di Buttigliera Alta (TO), 10090
tel: +39 011 9348228
fax: +39 011 9348861

Warren Uniewski (cirus) wrote :

I also had the problem were grub chose the wrong root (hdx,y) for all partitions.

Changed in grub:
status: Unconfirmed → Confirmed
vachun (vachun-spintec) wrote :

We have the machine locked again with the same problem after new kernel update
linux-image-2.6.15-27-amd64-generic (2.6.15-27.48).

Is there anybody who cares about ubuntu users and about perception of stability and reliabilty of Ubuntu as a distribution ?!?

I consider this a serious ( if not critical ) bug and have no way to label it as such or at least escalate in some way. I've contacted directly Ben Collins (see above ) but he dismised the question without ever considering to verify the bug.

According our suggestion, our customers have choose Ubuntu instead of Windows XP but now they come to doubt about their decision.

Lionel Le Folgoc (mrpouit) wrote :

> because all grub entries are 'patched' to point to (hd0,0) and /dev/hda1

Did you replace manually all occurences of hd(0,0), or did you change the commented value (such as groot=, kopt=), and then run 'sudo update-grub' ?

If you installed ubuntu on the first partition, and then you moved it, and edited manually hd(0,0), that is not a bug, this part is overriden at each call of 'sudo update-grub'.

vachun (vachun-spintec) wrote :

Lionel Le Folgoc wrote:
>> because all grub entries are 'patched' to point to (hd0,0) and
>>
> /dev/hda1
>
> Did you replace manually all occurences of hd(0,0), or did you change
> the commented value (such as groot=, kopt=), and then run 'sudo update-
> grub' ?
>
>
Thank you for your hints.

I'm using linux from 1993 but normally I'm using RedHat like distros. We
have choose Ubuntu, because it is easier to use for unexperienced users,
and generally hassle free. There are, of course, some differences
between Debian / Fedora system files structure and tools. For Example on
Fedora Core 1 there is no update-grub tool nor groot or kopt entries in
menu.lst.

My customers have systems with Ubuntu up and running for couple of month
and there was kernel updates in past, that doesn't trigger this behavior.
Anyway, when customer machine was blocked because of wrong grub entries,
I have replaced manually all occurrences of hd(0,0) with hd(0,1) and all
occurrences of /dev/hda1 with /dev/sda2. Needless to say, it lasts only
2 days till next kernel update and machine reboot ...

I will check, groot or kopt values in menu.lst file and try update-grub
to see, what happen and let you know.

> If you installed ubuntu on the first partition, and then you moved it,
> and edited manually hd(0,0), that is not a bug, this part is overriden
> at each call of 'sudo update-grub'.
>
>

--
Jan Vachun

Spintec s.r.l.
C.so Torino 89/A
Ferriera di Buttigliera Alta (TO), 10090
tel: +39 011 9348228
fax: +39 011 9348861

Tormod Volden (tormodvolden) wrote :

Jan, did you fix your #kopt line? Please attach your menu.lst if you still have problems.

Changed in grub:
status: Confirmed → Needs Info

i have modified both the groot line and the 2 kopt lines a couple of times now.

this last kernel update i have become certain - the changes to both of these values in menu.lst are edited away by the script that updates grub.

i am sure those two lines were correct because this has happened a few times already.

the question is - what other file could hold these values which the replace my values in menu.lst

the groot value is reverted by some script from correct (hd0,1) to incorrect (hd0,0)
and the kroot is reverted from correct UUID=18942279-eaf5-4e86-a174-8b04b5d3127e ro to incorrect (this is my first windows partition) UUID=3890B82E90B7F10C ro

i just tried to update menu.lst manually and then did update-grub
perhaps this error message is helpful. of course im certain of the UUID - i got it from /dev/disks/by-uuid and it seems to work in fstab every time

Searching for GRUB installation directory ... found: /boot/grub
findfs: Unable to resolve 'UUID=18942279-eaf5-4e86-a174-8b04b5d3127e'
Cannot determine root device. Assuming /dev/hda1
This error is probably caused by an invalid /etc/fstab
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz-2.6.20-10-generic
Found kernel: /boot/vmlinuz-2.6.20-9-generic
Found kernel: /boot/memtest86+.bin
Updating /boot/grub/menu.lst ... done

this gets weirder and weirder. when i do mount or df to check that the root partition is mounted it doesnt show up...but im running my linux box with no problem...what is happening? any advice or help appreciated

sorry for the repeated comments but now im sure the problem lies with findfs

when my fstab uses /dev/hda2 update-grub works correctly. but when my fstab uses the UUID which is correct update-grub fails and gives the above error message. i hope this will allow somebody to solve it...

ok - never mind. this last update it respected the groot and kopt
settings...

Oncle Tom (oncletom) wrote :

Hello :)

i've got the same problem I guess and I got it since I upgraded from Dapper Drake to Edgy Eft (even with reinstalling the system from scratch). grub does not properly generate the menu.lst so I can't boot without editing manually the startup script. It is a problem of UUID by the way : my fstab / blkid and disk/by-uuid does not return the same UUID for my sda8 partition (my / one).

My menu.lst is attached here as it was generated since my last upgrade (last friday, Feisty Fawn).
(hd0,0) should be (hd0,7)
UUID=0fcf2f00-4ae6-1c00-0000-000002000000

For the sda8 partition, I have this entry in /etc/fstab :
# /dev/sda8
UUID=bc77bcba-0825-4875-8536-21295d7ffc2c / ext3 defaults,errors=remount-ro 0 1

blkid returns this to me :
/dev/sda8: UUID="bc77bcba-0825-4875-8536-21295d7ffc2c" SEC_TYPE="ext2" TYPE="ext3"

ls -al /dev/disk/by-uuid returns this :
lrwxrwxrwx 1 root root 10 2007-04-25 19:39 c4392b00-d50c-1d00-0000-000002000000 -> ../../sda8

:-/

Tormod Volden (tormodvolden) wrote :

Jedi, I suggest you file a new bug against udev (which maps disk/by-uuid using vol_id) for the mismatch with blkid (from e2fsprogs).

Oncle Tom (oncletom) wrote :

OK thanks. I was not sure where the problem could come from ;-)

Tormod Volden (tormodvolden) wrote :

We are closing this bug report as it lacks the information, described in the previous comments, we need to investigate the problem further. However, please reopen it if you can give us the missing information and don't hesitate to submit bug reports in the future.

Changed in grub:
status: Needs Info → Rejected

This is still a problem. I detail much of it here:
http://www.howtoadvice.com/DellUbuntu/

Tormod Volden (tormodvolden) wrote :

In Dell's case and in the original report, it was the result of a custom installation that didn't set #groot correctly.

As of Ubuntu 7.04, people install via the gui, and there is no opportunity for them to set #groot. If they don't install their OS at (hd1,0), they'll lose their ability to boot the first time they do a system update that contains a kernel update!

Let's responsibility here, and help people avoid this #groot technicality. It can be abstracted and it can be automated.

To Do:

-Add "wise auto #groot setting" functionality to the gui installation process.

-Make the update-grub command take the "root partition of the previous kernel" into consideration when setting the current kernel's root partition.

-Consider the ramifications of "preventing the update-grub command from overwriting all kernel's root partitions with #groot". If you look at the "before and after" of the /boot/grub/menu.lst file (at http://www.howtoadvice.com/DellUbuntu/ ), you'll notice that update-grub not only set newly added kernel's root partition to #groot, it also overwrites all other kernel entries' root partitions with #groot. How is this ever useful? Default usually means a value you give when a user doesn't specify. However, when it comes to a previous kernel entry in menu.lst, they've been previously specified. Why overwrite with a default value? Now in addition to not being able to boot the new kernel, they "new user of ubuntu" can't even boot his previous kernel that worked fine before he did a simple system update with the Ubuntu Update manager.

The bottom line is this. We can't allow a simple system update (with Ubuntu's Update Manager) cause a machine not to boot. We have to think this through and make a kernel-updater smart enough to choose the previous kernel's root partition!

Tormod Volden (tormodvolden) wrote :

Lonnie, it's not that bad. The installer sets #groot correctly for most people. It's usually when you have raids, or a mix of ide/sata drives it guesses wrong. But I agree that it shouldn't do these poor guesses every time you install a new kernel.

It makes sense to have one #groot for all kernel installed on the same system (= same boot partition). #groot is what the grub loader is told at boot, to know where to find your kernels. Since your kernels are all on one partition, there's only one way to specify that partition -> one #groot.

For more discussion, I suggest you write up a proposal on a wiki page, and work towards a specification. OTOH, grub is so system critical, that Ubuntu is probably not going to deviate far from Debian, and it should get fixed there first. However, grub developers are more interested in grub2 than reworking the old, crufted grub.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments