Timeout should not be -1 if $recordfail

Bug #669481 reported by Mikael Nordfeldth
152
This bug affects 30 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Fix Released
Medium
Unassigned
Oneiric
Fix Released
Medium
Unassigned
Precise
Fix Released
Medium
Unassigned

Bug Description

Binary package hint: grub2

SRU justification

[Impact]
If a system fails to boot to the point where it runs /etc/init.d/grub-common, then a subsequent reboot will leave the system at a grub prompt which waits indefinitely for user input. Systems such as cloud instances or other headless systems may not be able to provide the necessary input to grub, which results in unavailable system.

[Test case]
A thorough test case is mentioned in comment #8 and has been used to confirm that the Quantal fix was applicable.

One shortcut to testing the fix is to disable execution of /etc/init.d/grub-common by doing :

$ sudo chmod -x /etc/init.d/grub-common

Without the fix, a reboot will result in the instance waiting at the GRUB prompt. With the proposed fix, it will boot normally.

[Regression Potential]
There is little potential as without intentional modification of /etc/default/grub, the variable defaults to its previous value of -1.

My experience is that $recordfail is not always written successfully, which is by default tested in grub.cfg (through grub.d/00_header)

This causes grub2 to set timeout=-1 which removes the timeout for bootup in these cases.

On computers which have not been configured for "USB Legacy" in the BIOS but use USB keyboards, this looks as if grub has frozen - because the keyboard does not respond. The consequence is that the computer won't boot without either knowledge on how to configure BIOS (and what) or using a PS/2 keyboard (not always available or possible to plugin).

My suggestion is that Ubuntu recognizes $recordfail but instead of disabling timeout (-1) it should be set to a relatively high value (such as 10 seconds). This will be enough to alert the user on what option is booting, and that the computer hasn't frozen, as well as enabling false/known failures to be ignored.

Maybe a future addition to recognising $recordfail is to have a warning on the boot menu, but that is outside the scope of this report.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: grub-pc 1.98+20100804-5ubuntu3
ProcVersionSignature: Ubuntu 2.6.35-22.35-generic-pae 2.6.35.4
Uname: Linux 2.6.35-22-generic-pae i686
Architecture: i386
Date: Mon Nov 1 15:33:30 2010
InstallationMedia: Ubuntu-Server 10.10 "Maverick Meerkat" - Release i386 (20101007)
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: grub2

Related bugs:
 * bug 1035279: instance hangs at grub prompt after reboot followed by euca-reboot-instances
 * bug 872244: grub2 recordfail logic prevents headless system from rebooting after power outage

Related branches

Revision history for this message
Mikael Nordfeldth (mmn) wrote :
Colin Watson (cjwatson)
Changed in grub2 (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
hackeron (hackeron) wrote :

This is a major problem. Some times if there is a power failure, or some kind of other problem like a kernel oops, you send a reboot request to the server and it never boots because of this timeout -1 :/

I had to drive 40 miles to plug a keyboard in and press enter because of this. Please, please, please change this to a reasonable timeout or make this easily changeable without having to modify grub.d code :/

Revision history for this message
Patrick Bouffard (patrick-m-bouffard) wrote :

Bump - though I don't have to drive 40 miles, I do have to plug a keyboard and monitor into my robot which otherwise needs neither. If I'm using it in the field that means I have to lug this extra equipment with me all the time. :/

Revision history for this message
Barosl LEE (barosl) wrote :

Same problem here. I've suffered from the servers which are stuck in the GRUB phase occasionally. Now I know the reason. The recordfail feature can be a useful, but setting the timeout to -1 is too excessive in my opinion.

As the above stated, changing it to a reasonable value, or at least making an option to change the behavior in /etc/default/grub would greatly help. These changes are particularly more important in the server edition.

Revision history for this message
Daniel Ellis (danellisuk) wrote :

This is also a problem for MythTV. The system automatically powers on/off to record TV programs. This issue can mean the machine fails to boot and therefor will miss recordings. It also means the machine will be stuck in this state for the duration. Particularly bad if you are on vacation!

Revision history for this message
Sean DS (se4n-1) wrote :

Like others who have commented, I am also affected by this bug, to save juice I use rtcwake and have been careful to remove recordfail altogether, however when I left my computer this Christmas something had reverted and after two hibernates it was back to sitting on the GRUB screen 200 kilometers away with no one to hit the enter key for a week. Lucky my computer doesn't use a lot of juice or this would be doubly annoying!

n.b. I wonder how much CO2 GRUB is responsible for?!

Revision history for this message
Sean DS (se4n-1) wrote :

Temporary fix: in /etc/grub.d/00_header line 242 change the timeout from -1 to any positive integer for a quick workaround.

Revision history for this message
mihow (mihow) wrote :

Same issue here. Very annoying and embarrassing when maintaining an Ubuntu server for a client. I was using the same fix Sean DS posted but the issue returned after I upgraded the kernel.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu-meta (Ubuntu):
status: New → Confirmed
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Submitted merge propsosals that parameterize this behavor.

Colin Watson (cjwatson)
no longer affects: ubuntu-meta (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package grub2 - 1.99-22ubuntu1

---------------
grub2 (1.99-22ubuntu1) quantal; urgency=low

  [ Colin Watson ]
  * Resynchronise with Debian. Remaining changes:
    - Adjust for default Ubuntu boot options ("quiet splash").
    - Default to hiding the menu; holding down Shift at boot will show it.
    - Set a monochromatic theme and an appropriate background for Ubuntu.
    - Apply Ubuntu GRUB Legacy changes to legacy update-grub script.
    - Fix backslash-escaping in merge_debconf_into_conf.
    - Remove "GNU/Linux" from default distributor string.
    - Add crashkernel option.
    - Bypass menu unless other OSes are installed or Shift is pressed.
    - Allow Shift to interrupt 'sleep --interruptible'.
    - Reduce visual clutter in normal mode.
    - Remove verbose messages printed before reading configuration.
    - Suppress kernel/initrd progress messages, except in recovery mode.
    - Show the boot menu if the previous boot failed.
    - Don't generate device.map during grub-install or grub-mkconfig.
    - Adjust upgrade version checks for Ubuntu.
    - Suppress "GRUB loading" message unless Shift is held down.
    - Adjust versions of grub-doc and grub-legacy-doc conflicts.
    - Fix LVM/RAID probing in the absence of /boot/grub/device.map.
    - Look for .mo files in /usr/share/locale-langpack first.
    - Build-depend on qemu-kvm rather than qemu-system for grub-pc tests.
    - Check hardware support before using gfxpayload=keep.
    - Put second and subsequent Linux menu entries in a submenu.
    - Preferred resolution detection for VBE.
    - Set vt.handoff=7 for smooth handoff to kernel graphical mode.
    - Update default/grub.md5sum to include maverick's default md5sum.
    - In recovery mode, add nomodeset to the Linux kernel arguments, and
      remove the 'set gfxpayload=keep' command.
    - Skip Windows os-prober entries on Wubi systems, and suppress the menu
      by default if those are the only other-OS entries.
    - Handle probing striped DM-RAID devices.
    - Replace 'single' by 'recovery' when friendly-recovery is installed.
    - Use qemu -no-kvm in tests for now to work around LP #947597.
    - Disable cursor as early as possible in grub_main.
    - Don't crash on inaccessible loop device backing paths.
    - Backport several upstream EFI device discovery patches.

  [ Ben Howard ]
  * Parameterization of recordfail setting. This allows users to define the
    default time out of GRUB when recordfail has been set. The current
    setting causes hangs on headless and appliances where access to the
    console is limited or prohibited. (LP: #669481)

grub2 (1.99-22) unstable; urgency=low

  [ Debconf translations ]
  * Khmer added (Khoem Sokhem)
  * Slovenian (Vanja Cvelbar). Closes: #670616
  * Traditional Chinese (Vincent Chen).
  * Vietnamese (Hai Lang).
  * Marathi (Sampada Nakhare)
  * Finnish (Timo Jyrinki). Closes: #673976
  * Latvian (Rūdolfs Mazurs). Closes: #674697

  [ Colin Watson ]
  * Make apport hook compatible with Python 3.
  * Add upstream r3476 (fix memory leak in grub_disk_read_small) to
    4k_sectors.patch, otherwise the larger disk cache due to
    efi_disk_cach...

Read more...

Changed in grub2 (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Bart Verwilst (verwilst) wrote :

While fixed for Quantal, Precise still suffers from this issue. Since it's the latest LTS release - and thus used for most of the servers out there - it would be great to see this backported as well.

Scott Moser (smoser)
tags: added: cloud-images cloud-images-build
Changed in grub2 (Ubuntu Precise):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Scott Moser (smoser) wrote :

For anyone watching this bug, the easy fix here for 12.10 and later (we intend to backport to 12.04) is:
 echo GRUB_RECORDFAIL_TIMEOUT=0 | sudo tee -a /etc/default/grub
 sudo update-grub

the 'update-grub' is necessary to make the change take affect, and also to avoid a config-file prompt on grub upgrade.

description: updated
Revision history for this message
Scott Moser (smoser) wrote :

after 'update-grub', you should also run:
 DEBIAN_FRONTEND=noninteractive dpkg-reconfigure grub-pc

Louis Bouchard (louis)
description: updated
Revision history for this message
Louis Bouchard (louis) wrote :

Here is the debdiff for the requested SRU of the Quantal fix to Precise

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Mikael, or anyone else affected,

Accepted grub2 into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/grub2/1.99-21ubuntu3.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2 (Ubuntu Precise):
status: Triaged → Fix Committed
tags: added: verification-needed
Revision history for this message
Adam Conrad (adconrad) wrote :

Hello Mikael, or anyone else affected,

Accepted grub2 into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/grub2/1.99-21ubuntu3.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Louis Bouchard (louis)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Colin Watson (cjwatson) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 1.99-21ubuntu3.4

---------------
grub2 (1.99-21ubuntu3.4) precise-proposed; urgency=low

  * Revert 1.99-21ubuntu3.2 again,as it was erroneously included again
    in 1.99-21ubuntu3.3.

grub2 (1.99-21ubuntu3.3) precise-proposed; urgency=low

  [ Ben Howard ]
  * Parameterization of recordfail setting. This allows users to define the
    default time out of GRUB when recordfail has been set. The current
    setting causes hangs on headless and appliances where access to the
    console is limited or prohibited. (LP: #669481)

grub2 (1.99-21ubuntu3.2) precise-proposed; urgency=low

  * Revert previous SRU. This caused AMI cloud images to prompt about the
    changed configuration file, breaking automated upgrades. (LP: #1009294)
  * This reopens bug #978464, which will break Ubuntu 10.04->12.04 upgrades
    if user does not opt-in to reinstall grub-pc bootloader when prompted.
 -- Clint Byrum <email address hidden> Wed, 12 Sep 2012 12:04:59 -0700

Changed in grub2 (Ubuntu Precise):
status: Fix Committed → Fix Released
Chris J Arges (arges)
Changed in grub2 (Ubuntu Oneiric):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Please test proposed package

Hello Mikael, or anyone else affected,

Accepted grub2 into oneiric-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/grub2/1.99-12ubuntu5.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2 (Ubuntu Oneiric):
status: Confirmed → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
Revision history for this message
Louis Bouchard (louis) wrote :

fix confirmed for Oneiric as well, with latest grub2 in -proposed

tags: added: verification-done
removed: verification-needed
Revision history for this message
Scott Kitterman (kitterman) wrote :

--- Releasing grub2 ---
Proposed: 1.99-12ubuntu5.1
Release: 1.99-12ubuntu5
Copied to oneiric-updates

Changed in grub2 (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Revision history for this message
chrone (chrone81) wrote :

This isssue is still persist on headless Ubuntu 14.04.2 LTS server. Need to edit and update grub with GRUB_RECORDFAIL_TIMEOUT=2 option to avoid the boot stuck.

I wish Ubuntu comes with built-in GRUB_RECORDFAIL_TIMEOUT=2 as default. :)

Revision history for this message
Robie Basak (racb) wrote :

This bug was originally fixed on the basis that by adding GRUB_RECORDFAIL_TIMEOUT users could work around the issue. The fix worked as expected, but users are still hitting the issue without having tuned this value. I thought it would be easier to track a new issue about the default specifically in a separate bug, so I've created bug 1443735 to track the question of the default, rather than by re-opening this bug and confuse the status of the two separate fixes.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.