grub failing load stage 2 when serial console --stop=[1|2]

Bug #220336 reported by Dustin Kirkland  on 2008-04-21
4
Affects Status Importance Assigned to Milestone
grub (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: grub

A curious error with grub's serial console configuration.

Initially, we only saw this on a single machine, an HP DL145.

We have reports of this happening on other hardware, as well as other Linux distributions.

An installation is performed with console=ttyS0,9600n8 on the kernel command line. Install progresses smoothly, with everything visible in the console. This indicates that the hardware is okay, considering the Linux kernel can output and gather input from the serial console.

The installter writes /boot/grub/menu.lst with the first two lines being:

 serial --unit=0 --speed=9600 --word=8 --parity=no --stop=1
 terminal serial

Additionally, console=ttyS0,9600n8 is appended on each of the kernel boot lines in that same file.

It seems that the kernel console redirection to serial works, however the grub serial console does not. In fact, it hangs grub, just after loading stage 1.5.

The next step would be to build and install a special debug grub package on that machine which would verbosely log its doings on load. This would require an IS person in the lab to watch the physical console, as any remote developer is entirely blind at any time when reproducing this problem.

*** WORK AROUND ***

The problem can be worked around by commenting out in menu.lst:
# serial --unit=0 --speed=9600 --word=8 --parity=no --stop=1
# terminal serial

This will remove the ability to see the Grub menu, however, as soon as the kernel is booted with console=ttyS0,9600n8, the serial console will be associated again.

Note that the only two values allowed for stop are 1 and 2. See:
http://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/preparation-setspeed.html

Setting --stop=0 in menu.lst "works", however for undesired reasons. Basically, 0 is an invalid value, and as such, Grub just skips that line. In which case, commenting the line out actually makes more sense.

:-Dustin

Mathias Gug (mathiaz) wrote :

I've come across this bug on another machine, not an hp DL145.

Dustin Kirkland  (kirkland) wrote :

The following is from an email sent to me. I'm quoting it verbatim with the permission of it's author. It seems that this problem is more pervasive than affecting only Ubuntu and only HP machines:

-------------
Dustin,

I found your bug on the Ubuntu tracker
(https://bugs.launchpad.net/ubuntu/+source/grub/+bug/220336) and wanted
to add some additional info. I don't have an account nor do I use
Ubuntu but I thought this might be helpful to add to the bug report as
the console redirection issue is a far older and wider one than Ubuntu
on recent HP hardware. Hope it helps someone figure out how to fix it
because it drives me crazy:

http://www.ghidinelli.com/2006/11/06/configuring-console-access-for-linuxcentos/

In short, I've experienced this on RHEL4 and 5 (in actuality, CentOS
4/5) on various VA Linux hardware from ~2000 but the interesting thing
is that it works some of the time and not others, all with identical
grub/terminal configurations.

Hopefully the bug gets some attention because it's hugely
inconvenient... it hard locks the system and requires a power-button
reset to reboot.

Brian Ghidinelli
-------------

Dustin Kirkland  (kirkland) wrote :

Confirmed, as this is seen by at least Matthias, and Brian Ghidinelli

Changed in grub:
status: New → Confirmed
description: updated
Ahunt83 (andrew-hunt) wrote :

Having the same hanging issue at the Grub 1.5 stage on brand new R200 Dell servers running OpenSuse 10.3. The terminal timeout is set to 10 and we get 10 press any key to continue messages and then a full system hang requiring a hard reboot.

If we do press any key on a connected console (using Dell's Serial Over Lan) or locally before then end of the timeout then it boots fine so seems to be a bug in continuing at the end of the wait time.

Removing the terminal line from /boot/grub/menu.1st seems to fix the issue on our servers. The console in this case is sent by BMC to both the local screen and the remote console with no timeout so works a treat. This may only work with Dell's BMC/SOL but thought I'd mention it in case anyone else has spent a day getting frustrated with this like we have.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers