recordfail false positive causes headless servers to hang on boot by default

Bug #1443735 reported by Robie Basak on 2015-04-14
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
grub2 (Debian)
Fix Released
Unknown
grub2 (Ubuntu)
High
Unassigned
Precise
High
Unassigned
Trusty
High
Unassigned
Utopic
High
Unassigned
Vivid
High
Unassigned
grub2-signed (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Unassigned
Trusty
Undecided
Unassigned
Utopic
Undecided
Unassigned
Vivid
Undecided
Unassigned

Bug Description

[Impact]

On a headless server system, a user who does not have easy access to the console may find the system fails to come up after a power cut because the boot is blocked on a console menu prompt from grub that does not time out.

[Workaround]

Set GRUB_RECORDFAIL_TIMEOUT to some positive value (eg. 30) in /etc/default/grub and then run "sudo update-grub". However this needs to have been done before the problem occurs; when it has occurred, the only option a user has is to add a head to a headless system.

[Development Fix]

Default for GRUB_RECORDFAIL_TIMEOUT changed from -1 (indefinite wait) to 30 (proceed anyway after 30 seconds). Accepted in Debian, synced to Ubuntu in Wily; currently held in wily-proposed due to some items in the unapproved queue.

[Stable Fix]

Same as development fix.

[Regression Potential]

This fix changes user-visible behaviour deliberately because the previous behaviour led to this bug. Users of non-headless systems (eg. desktop) may miss the boot menu and come back to a failed boot or something; but if they attempt again, they should see the menu prompt for 30 seconds anyway.

[Test Case]

Steps to reproduce:

1. Boot a Vivid system installed from the server installer (not a cloud image).
2. Kill the power (or VM) while the kernel is initialising but before it has started init.
3. Power up the system (or start the VM) again.

Expected behaviour: the system should boot without user intervention.

Actual behaviour: the system hangs on the grub prompt.

[Details]

This was previously raised in bug 669481 but the solution applied then was just to add the GRUB_RECORDFAIL_TIMEOUT setting defaulted to -1. This allowed users to work around the problem by tuning GRUB_RECORDFAIL_TIMEOUT. I'm filing this bug separately as there is nothing wrong with the previous fix, but it didn't fix the problem for users by default. This bug is about fixing the default so that users don't have to discover and work around the issue.

An IRC discussion (http://irclogs.ubuntu.com/2015/02/27/%23ubuntu-devel.html#t13:54) concluded that everyone involved in the discussion is happy to change the timeout from infinity to 30s.

Colin asked for a fix in Debian, so I'll send a patch there and add a bug link. I'm also filing the bug here in order to track the fix in both Debian and Ubuntu.

Importance: High because of the impact to users on headless servers - from their perspective, this causes a system to fail to boot after an appropriately timed double power cut. I'm prompted to do this today because it just happened to me on my server, so perhaps it's more likely than I originally thought.

Robie Basak (racb) on 2015-04-14
description: updated
Robie Basak (racb) on 2015-04-14
summary: - recordfail false positive causes headless servers to hang on boot
+ recordfail false positive causes headless servers to hang on boot by
+ default
Changed in grub2 (Debian):
status: Unknown → New
Changed in grub2 (Debian):
status: New → Fix Released
Robie Basak (racb) on 2015-05-19
Changed in grub2 (Ubuntu Precise):
status: New → Triaged
Changed in grub2 (Ubuntu Trusty):
status: New → Triaged
Changed in grub2 (Ubuntu Utopic):
status: New → Triaged
Changed in grub2 (Ubuntu Vivid):
status: New → Triaged
Changed in grub2 (Ubuntu Precise):
importance: Undecided → High
Changed in grub2 (Ubuntu Trusty):
importance: Undecided → High
Changed in grub2 (Ubuntu Utopic):
importance: Undecided → High
Changed in grub2 (Ubuntu Vivid):
importance: Undecided → High
Robie Basak (racb) on 2015-05-19
description: updated
Robie Basak (racb) wrote :
Robie Basak (racb) wrote :
Robie Basak (racb) wrote :
Robie Basak (racb) wrote :
Robie Basak (racb) wrote :

There's an existing grub SRU in flight (2.02~beta2-9ubuntu1.2 in Trusty) so I'm deferring upload for all releases until it clears. These diffs are build tested only.

tags: added: patch
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.02~beta2-23

---------------
grub2 (2.02~beta2-23) unstable; urgency=medium

  [ Debconf translations ]
  * [da] Danish (Joe Dalton; closes: #781333).

  [ Felix Zielcke ]
  * Run the tests with LC_MESSAGES=C.UTF-8. Some tests fail with non
    english locale. (Closes: #782580)

  [ Mathieu Trudel-Lapierre ]
  * Backport from upstream:
    - arp, icmp: Fix handling in case of oversized or invalid packets.
      (LP: #1428005)

  [ Robie Basak ]
  * Change the default GRUB_RECORDFAIL_TIMEOUT to 30, so interactive users
    still get the opporunity to intervene after a real boot failure, but
    headless users will not end up stuck after boot failures that were
    really power failures (closes: #782552, LP: #1443735).

 -- Colin Watson <email address hidden> Thu, 14 May 2015 16:18:33 +0100

Changed in grub2 (Ubuntu):
status: Triaged → Fix Released
Robie Basak (racb) wrote :

Uploaded; now awaiting SRU team review.

Changed in grub2 (Ubuntu Precise):
status: Triaged → In Progress
Changed in grub2 (Ubuntu Trusty):
status: Triaged → In Progress
Changed in grub2 (Ubuntu Utopic):
status: Triaged → In Progress
Changed in grub2 (Ubuntu Vivid):
status: Triaged → In Progress

Hello Robie, or anyone else affected,

Accepted grub2 into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.02~beta2-22ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2 (Ubuntu Vivid):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in grub2 (Ubuntu Utopic):
status: In Progress → Fix Committed
Timo Aaltonen (tjaalton) wrote :

Hello Robie, or anyone else affected,

Accepted grub2 into utopic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.02~beta2-15ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2 (Ubuntu Trusty):
status: In Progress → Fix Committed
Timo Aaltonen (tjaalton) wrote :

Hello Robie, or anyone else affected,

Accepted grub2 into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.02~beta2-9ubuntu1.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Timo Aaltonen (tjaalton) wrote :

Hello Robie, or anyone else affected,

Accepted grub2 into precise-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/1.99-21ubuntu3.18 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2 (Ubuntu Precise):
status: In Progress → Fix Committed
Timo Aaltonen (tjaalton) wrote :

Hello Robie, or anyone else affected,

Accepted grub2-signed into utopic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2-signed/1.38.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2-signed (Ubuntu Utopic):
status: New → Fix Committed
Changed in grub2-signed (Ubuntu Vivid):
status: New → Fix Committed
Timo Aaltonen (tjaalton) wrote :

Hello Robie, or anyone else affected,

Accepted grub2-signed into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2-signed/1.46.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Adam Conrad (adconrad) wrote :

Hello Robie, or anyone else affected,

Accepted grub2-signed into precise-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2-signed/1.9~ubuntu12.04.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2-signed (Ubuntu):
status: New → Fix Released
Changed in grub2-signed (Ubuntu Precise):
status: New → Fix Committed
Changed in grub2-signed (Ubuntu Trusty):
status: New → Fix Committed
Adam Conrad (adconrad) wrote :

Hello Robie, or anyone else affected,

Accepted grub2-signed into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2-signed/1.34.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Diogo Matsubara (matsubara) wrote :

I verified the fix works for vivid (2.02~beta2-22ubuntu1.1), trusty (2.02~beta2-9ubuntu1.3) and precise (1.99-21ubuntu3.18).

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.02~beta2-9ubuntu1.3

---------------
grub2 (2.02~beta2-9ubuntu1.3) trusty; urgency=medium

  * Do not hang headless servers indefinitely on boot after edge case power
    failure timing (LP: #1443735). Instead, time out after 30 seconds and boot
    anyway, including on non-headless systems.

 -- Robie Basak <email address hidden> Tue, 19 May 2015 13:31:03 +0100

Changed in grub2 (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for grub2 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.02~beta2-22ubuntu1.1

---------------
grub2 (2.02~beta2-22ubuntu1.1) vivid; urgency=medium

  * Do not hang headless servers indefinitely on boot after edge case power
    failure timing (LP: #1443735). Instead, time out after 30 seconds and boot
    anyway, including on non-headless systems.

 -- Robie Basak <email address hidden> Tue, 19 May 2015 13:47:00 +0100

Changed in grub2 (Ubuntu Vivid):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 1.99-21ubuntu3.18

---------------
grub2 (1.99-21ubuntu3.18) precise; urgency=medium

  * Do not hang headless servers indefinitely on boot after edge case power
    failure timing (LP: #1443735). Instead, time out after 30 seconds and boot
    anyway, including on non-headless systems.

 -- Robie Basak <email address hidden> Tue, 19 May 2015 12:22:34 +0100

Changed in grub2 (Ubuntu Precise):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2-signed - 1.9~ubuntu12.04.9

---------------
grub2-signed (1.9~ubuntu12.04.9) precise; urgency=medium

  * Rebuild against grub-efi-amd64 1.99-21ubuntu3.18 (LP: #1443735).

 -- Robie Basak <email address hidden> Fri, 03 Jul 2015 10:19:22 +0000

Changed in grub2-signed (Ubuntu Precise):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2-signed - 1.34.4

---------------
grub2-signed (1.34.4) trusty; urgency=medium

  * Rebuild against grub-efi-amd64 2.02~beta2-9ubuntu1.3 (LP: #1443735).

 -- Robie Basak <email address hidden> Fri, 03 Jul 2015 10:19:28 +0000

Changed in grub2-signed (Ubuntu Trusty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2-signed - 1.38.1

---------------
grub2-signed (1.38.1) utopic; urgency=medium

  * Rebuild against grub-efi-amd64 2.02~beta2-15ubuntu0.1 (LP:
    #1443735).

 -- Robie Basak <email address hidden> Fri, 03 Jul 2015 10:19:32 +0000

Changed in grub2-signed (Ubuntu Utopic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2-signed - 1.46.1

---------------
grub2-signed (1.46.1) vivid; urgency=medium

  * Rebuild against grub-efi-amd64 2.02~beta2-22ubuntu1.1 (LP:
    #1443735).

 -- Robie Basak <email address hidden> Fri, 03 Jul 2015 10:19:37 +0000

Changed in grub2-signed (Ubuntu Vivid):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.02~beta2-15ubuntu0.1

---------------
grub2 (2.02~beta2-15ubuntu0.1) utopic; urgency=medium

  * Do not hang headless servers indefinitely on boot after edge case power
    failure timing (LP: #1443735). Instead, time out after 30 seconds and boot
    anyway, including on non-headless systems.

 -- Robie Basak <email address hidden> Tue, 19 May 2015 13:44:35 +0100

Changed in grub2 (Ubuntu Utopic):
status: Fix Committed → Fix Released
Diogo Matsubara (matsubara) wrote :

I just tried grub2 (2.02~beta2-15ubuntu0.1) on Utopic and the fix works as intended.

Ewan (ewann) wrote :

Something you may wish to consider in relation to this fix:

Proposal 1)
I noticed the bug description already contains the comment "Maybe a future addition to recognizing $recordfail is to have a warning on the boot menu, but that is outside the scope of this report.". I +1 this for some future update.

Proposal 2) - might be deemed to be low enough risk to implement prior to dev/test/release of proposal 1.
Add the line:
GRUB_RECORDFAIL_TIMEOUT=30
to the default /etc/default/grub file

Context:
I arrived here while investigating a hibernate/resume issue with Ubuntu 15.04 on a Acer C710 (parrot) Chromebook.

Although it is possible to make hibernate/resume functional on this device, the hibernate action seems to cause recordfail to be set. As a consequence, I see the grub menu & 30 second countdown on resume. (It's true that prior to this fix I saw the grub menu with no countdown, which was even more perplexing - so thank you :)

There were no settings in /etc/default/grub that had a value of 30, so it wasn't immediately obvious to me what was happening.

Implementing one of these proposals may avert support questions from laptop users, who experience what might seem an unconfigured / unrequested 30 second delay by grub to laptop resume. OTOH: maybe this configuration is such an edge case it is preferable to do nothing at all

In case someone else arrives here on the same journey, I'll link https://gist.github.com/ewann/b0aeedf57bd6879946c6 for related Acer C710 issues.

Robie Basak (racb) wrote :

Ewan,

Thank you for your comments.

> I +1 this for some future update.

I don't think anyone is working on this at the moment, or that there is any plan to update stable releases with it. Please feel free to file a separate bug to track it.

> Add the line:
GRUB_RECORDFAIL_TIMEOUT=30
to the default /etc/default/grub file

We deliberately didn't do this is as will cause a prompt requiring user intervention on update for any user who has modified /etc/default/grub - both for the update during an existing stable release's lifetime and during a release upgrade.

As far as I understand, the recordfail path should only get followed after a boot failure. If you're also getting it at other times, then please file a bug for that separate issue.

Ewan (ewann) wrote :

Robie,

> We deliberately didn't do this is as will cause a prompt
I suppose another way would be to conditionally append the value via a postinst script, but I further suppose you declined to take this approach because it lacks elegance.

> As far as I understand, the recordfail path should only get followed after a boot failure.
I opened https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1475620 - I agree the issue should get corrected at source, if possible.

> Please feel free to file a separate bug
I'll see first the outcome of 1475620.

thanks for your feedback

On Fri, Jul 17, 2015 at 12:02:17PM -0000, Ewan wrote:
> Robie,
>
> > We deliberately didn't do this is as will cause a prompt
> I suppose another way would be to conditionally append the value via a
> postinst script, but I further suppose you declined to take this
> approach because it lacks elegance.

It would cause an unnecessary conffile prompt on the next conffile
update following that, since it would appear to be a user-made change.

That's why the best thing to do is to leave files in /etc/ for only
users to change directly, rather than have maintainer scripts change
them. We do use .d/ directories to try and help with this, but I don't
think it'd help in this case as it won't solve the user confusion issue
you're trying to solve.

> > As far as I understand, the recordfail path should only get followed after a boot failure.
> I opened https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1475620
> - I agree the issue should get corrected at source, if possible.

Thank you for filing this. I've moved it to grub2 as I'm pretty sure
it's the grub2 packaging that arranges the recordfail test (as opposed
to kernel code or packaging).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.