GRUB_RECORDFAIL_TIMEOUT prevents servers from booting

Bug #1438275 reported by Fred on 2015-03-30
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Undecided
Unassigned

Bug Description

On Ubuntu, if there's a power failure or the system is otherwise left in a bad state on its last boot, GRUB2 will just hang there indefinitely waiting for user input to select the kernel to boot and any relevant kernel command-line options. This is annoying since most of the test systems are headless so when seeing an unresponsive system the server operator have to go attach a HDMI display and USB keyboard only to find out there was no serious system error but just the system hanging at GRUB for whatever reason at that time.

Michael Larabel have problems booting his Ubuntu server farm due to it GRUB being configured to use GRUB_RECORDFAIL_TIMEOUT.

According to Michael - Ubuntu seems to be the only major distribution though dealing with GRUB_RECORDFAIL_TIMEOUT and causing the issues on the failed boot.

If indeed Ubuntu is the only distribution with this setting, then maybe it should be changed?

http://www.phoronix.com/scan.php?page=news_item&px=Ubuntu-Common-GRUB2-Issue

description: updated
description: updated
Steve Langasek (vorlon) wrote :

The behavior you describe is the result of a deliberate design decision, not a bug.
 - A power cut after the system has fully booted does not result in a GRUB prompt.
 - A power cut before the system has fully booted is indistinguishable from any other boot failure. The only thing we can say with certainty is that the system failed to boot. We absolutely do not want to try to boot a second time with the same options without giving the admin a chance to interact with the system.

If a power cut on a *booted* system is resulting in the user being thrown to a bootloader prompt, that would be a bug. But that is not the behavior being described here.

We are not going to degrade the experience for users who need to debug/recover from a failed boot in exchange for optimizing for the unusual case of someone cutting the power in the middle of the boot and expecting a non-interactive subsequent boot.

affects: grub (Ubuntu) → grub2 (Ubuntu)
Changed in grub2 (Ubuntu):
status: New → Invalid
Tobin Davis (gruemaster) wrote :

Maybe instead of 'degrading the experience', you can just document the change so that people using this OS for testing systems can automate around it without having to resort to serial port output scanning and lab wiring contortion.

In our test environment, we often see a MCE that forces the system to reboot (even with nomce on the kernel cmdline). This often causes the system to reboot and sit at grub. We worked around the issue by adding this to /etc/default/grub:

GRUB_TERMINAL='console serial"
GRUB_SERIAL_COMMAND="serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1"

Then we can monitor and automate the serial console for when grub gets stuck. It took a while to figure out the correct grub settings as they too were undocumented.

Steve Langasek (vorlon) wrote :

If you really want to alter the behavior of grub on failed boot, so that it will non-interactively reboot instead of waiting indefinitely, you can also just set a different value for GRUB_RECORDFAIL_TIMEOUT in /etc/default/grub (instead of the default of -1).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers