update-grub-legacy-ec2 ignores kernels named -generic

Bug #1005551 reported by Ben Howard on 2012-05-28
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Critical
Unassigned
Precise
High
Unassigned

Bug Description

== Begin SRU Information ==
[General Description]
Amazon's EC2 is a xen based cloud platform, the bootloader that is used is 'pvgrub'. pvgrub runs inside a xen instance, reads a /boot/grub/menu.lst file, and loads the kernels specified there. It does not support grub2 style configuration (/boot/grub/grub.cfg). Thus, we have installed inside cloud-images, a package named 'grub-legacy-ec2' that maintains /boot/grub/menu.lst.

grub-legacy-ec2 does not simply assume all kernels are candidates for inclusion in /boot/grub/menu.lst. Instead, it has some very basic whitelist metrics. Those previously required a kernel to end in '-virtual'.

In the development cycle of 12.10, the -virtual kernel went away. It is now simply a -generic kernel with a subset of modules.

[Impact]
In 12.04, it is now supported to run a 12.10 "enablement kernel" (https://wiki.ubuntu.com/Kernel/LTSEnablementStack). These new kernels will be named in the newer fashion, and will end in '-generic', rather than '-virtual'. As a result, grub-legacy-ec2's whitelist will not write entries for these kernels to /boot/grub/menu.lst.

The end result is that the user who installs these kernels will not be able to easily boot them on EC2 (or other xen guest).

[Test Case]
 * Boot a cloud instance or cloud image
   The daily build of a EC2 AMI id at http://cloud-images.ubuntu.com/server/precise/current/ is fine.
 * install a backports kernel
   sudo apt-get install linux-generic-lts-quantal
 * verify that kernel new kernel is included in /boot/grub/menu.lst
   Previously the kernel would not be included there.
   Previously:
   $ grep "3.5.[0-9]" /boot/grub/menu.lst || echo NOT_FOUND
   NOT_FOUND

   But with new update-grub, it will be.
   # enable proposed
   $ echo "deb http://archive.ubuntu.com/ubuntu precise-proposed main" |
     sudo tee /etc/apt/sources.list.d/proposed.list
   $ sudo apt-get update
   $ sudo apt-get install grub-legacy-ec2
   $ sudo update-grub-legacy-ec2
   $ grep "3.5.[0-9]" /boot/grub/menu.lst || echo NOT_FOUND
   title Ubuntu 12.04.2 LTS, kernel 3.5.0-24-generic
   kernel /boot/vmlinuz-3.5.0-24-generic root=LABEL=cloudimg-rootfs ro console=hvc0
   ...

[Regression Potential]
The biggest potential for regression would be in writing an entry to /boot/grub/menu.lst for a kernel that would not boot. There is protection against this in the included patch by checking the version number as well as the name.

That amounts to:
 dpkg --compare-versions ${ver_flavor%-generic} ge 3.4.0-3 && return 0;;
where 'ver_flavor' is the version as found in the file name of the kernel (ie, /boot/vmlinux-2.6.35-13-generic).
== End SRU Information ==

Quantal stopped booting on EC2 around 2012-05-25.

ben@padfoot:~$ ec2-get-console-output -i i-75727213
Required parameter 'INSTANCE' missing (-h for usage)
ben@padfoot:~$ ec2-get-console-output i-75727213
i-75727213
2012-05-28T14:17:49+0000
Xen Minimal OS!
  start_info: 0xb10000(VA)
    nr_pages: 0x6a400
  shared_inf: 0x001a5000(MA)
     pt_base: 0xb13000(VA)
nr_pt_frames: 0x9
    mfn_list: 0x967000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: root=/dev/sda1 ro 4
  stack: 0x946780-0x966780
MM: Init
      _text: 0x0(VA)
     _etext: 0x61e65(VA)
   _erodata: 0x76000(VA)
     _edata: 0x7b6d4(VA)
stack start: 0x946780(VA)
       _end: 0x966d34(VA)
  start_pfn: b1f
    max_pfn: 6a400
Mapping memory range 0xc00000 - 0x6a400000
setting 0x0-0x76000 readonly
skipped 0x1000
MM: Initialise page allocator for e6c000(e6c000)-0(6a400000)
MM: done
Demand map pfns at 6a401000-7a401000.
Heap resides at 7a402000-ba402000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0x6a401000.
Initialising scheduler
Thread "Idle": pointer: 0x7a402008, stack: 0x6a030000
Initialising xenbus
Thread "xenstore": pointer: 0x7a402478, stack: 0x6a040000
Dummy main: start_info=0x966880
Thread "main": pointer: 0x7a4028e8, stack: 0x6a050000
"main" "root=/dev/sda1" "ro" "4"
vbd 2049 is hd0
******************* BLKFRONT for device/vbd/2049 **********

backend at /local/domain/0/backend/vbd/126/2049
Failed to read /local/domain/0/backend/vbd/126/2049/feature-barrier.
Failed to read /local/domain/0/backend/vbd/126/2049/feature-flush-cache.
16777216 sectors of 0 bytes
**************************
vbd 2050 is hd1
******************* BLKFRONT for device/vbd/2050 **********

backend at /local/domain/0/backend/vbd/126/2050
Failed to read /local/domain/0/backend/vbd/126/2050/feature-barrier.
Failed to read /local/domain/0/backend/vbd/126/2050/feature-flush-cache.
312705024 sectors of 0 bytes
**************************
vbd 2051 is hd2
******************* BLKFRONT for device/vbd/2051 **********

backend at /local/domain/0/backend/vbd/126/2051
Failed to read /local/domain/0/backend/vbd/126/2051/feature-barrier.
Failed to read /local/domain/0/backend/vbd/126/2051/feature-flush-cache.
1835008 sectors of 0 bytes
**************************
[H[J Booting 'Ubuntu quantal (development branch), memtest86+'

root (hd0)
 Filesystem type is ext2fs, using whole disk
kernel /boot/memtest86+.bin

xc_dom_probe_bzimage_kernel: kernel is not a bzImage
ERROR Invalid kernel: xc_dom_find_loader: no loader found

xc_dom_core.c:536: panic: xc_dom_find_loader: no loader found
xc_dom_parse_image returned -1

Error 9: Unknown boot failure

Press any key to continue...

On 28.05.2012 16:25, Ben Howard wrote:
> Public bug reported:
>
> Quantal stopped booting on EC2 around 2012-05-25.
>
> ben@padfoot:~$ ec2-get-console-output -i i-75727213
> Required parameter 'INSTANCE' missing (-h for usage)
> ben@padfoot:~$ ec2-get-console-output i-75727213
> i-75727213
> 2012-05-28T14:17:49+0000
> Xen Minimal OS!
> start_info: 0xb10000(VA)
> nr_pages: 0x6a400
> shared_inf: 0x001a5000(MA)
> pt_base: 0xb13000(VA)
> nr_pt_frames: 0x9
> mfn_list: 0x967000(VA)
> mod_start: 0x0(VA)
> mod_len: 0
> flags: 0x0
> cmd_line: root=/dev/sda1 ro 4
> stack: 0x946780-0x966780
> MM: Init
> _text: 0x0(VA)
> _etext: 0x61e65(VA)
> _erodata: 0x76000(VA)
> _edata: 0x7b6d4(VA)
> stack start: 0x946780(VA)
> _end: 0x966d34(VA)
> start_pfn: b1f
> max_pfn: 6a400
> Mapping memory range 0xc00000 - 0x6a400000
> setting 0x0-0x76000 readonly
> skipped 0x1000
> MM: Initialise page allocator for e6c000(e6c000)-0(6a400000)
> MM: done
> Demand map pfns at 6a401000-7a401000.
> Heap resides at 7a402000-ba402000.
> Initialising timer interface
> Initialising console ... done.
> gnttab_table mapped at 0x6a401000.
> Initialising scheduler
> Thread "Idle": pointer: 0x7a402008, stack: 0x6a030000
> Initialising xenbus
> Thread "xenstore": pointer: 0x7a402478, stack: 0x6a040000
> Dummy main: start_info=0x966880
> Thread "main": pointer: 0x7a4028e8, stack: 0x6a050000
> "main" "root=/dev/sda1" "ro" "4"
> vbd 2049 is hd0
> ******************* BLKFRONT for device/vbd/2049 **********
>
>
> backend at /local/domain/0/backend/vbd/126/2049
> Failed to read /local/domain/0/backend/vbd/126/2049/feature-barrier.
> Failed to read /local/domain/0/backend/vbd/126/2049/feature-flush-cache.
> 16777216 sectors of 0 bytes
> **************************
> vbd 2050 is hd1
> ******************* BLKFRONT for device/vbd/2050 **********
>
>
> backend at /local/domain/0/backend/vbd/126/2050
> Failed to read /local/domain/0/backend/vbd/126/2050/feature-barrier.
> Failed to read /local/domain/0/backend/vbd/126/2050/feature-flush-cache.
> 312705024 sectors of 0 bytes
> **************************
> vbd 2051 is hd2
> ******************* BLKFRONT for device/vbd/2051 **********
>
>
> backend at /local/domain/0/backend/vbd/126/2051
> Failed to read /local/domain/0/backend/vbd/126/2051/feature-barrier.
> Failed to read /local/domain/0/backend/vbd/126/2051/feature-flush-cache.
> 1835008 sectors of 0 bytes
> **************************
> [H[J Booting 'Ubuntu quantal (development branch), memtest86+'
>
> root (hd0)
> Filesystem type is ext2fs, using whole disk

> kernel /boot/memtest86+.bin

^Huh?!?

>
> xc_dom_probe_bzimage_kernel: kernel is not a bzImage
> ERROR Invalid kernel: xc_dom_find_loader: no loader found
>
> xc_dom_core.c:536: panic: xc_dom_find_loader: no loader found
> xc_dom_parse_image returned -1
>
> Error 9: Unknown boot failure
>
> Press any key to continue...
>
> ** Affects: ubuntu
> Importance: Undecided
> Assignee: Canonical Kernel Team (canonical-kernel-team)
> Status: New
>
>
> ** Tags: cloud-images ec2
>

Oh...

After cracking open a local copy of the image, it looks like linux-image-virtual has gone away. We are assuming that linux-image-virtual is going to be there.

Taking ownership of the bug, and raising to critical.

I see this was deleted on 2012-05-22:
https://launchpad.net/ubuntu/quantal/amd64/linux-image-3.4.0-2-virtual

Changed in ubuntu:
assignee: Canonical Kernel Team (canonical-kernel-team) → Ben Howard (utlemming)
status: New → Confirmed
importance: Undecided → Critical

Tested and submitted fixed.

The problem is that menu.lst is generated via the grub-legacy-ec2 package. In order to determine the appropriate kernels it looks for -virtual and then excludes the other kernels before adding the others. In this case, since -generic is the new -virtual, a valid kernel is not being choosen. This is a simple one-line fix.

Merge proposal is out for smoser.

=== modified file 'debian/update-grub-legacy-ec2'
--- debian/update-grub-legacy-ec2 2011-03-02 04:22:28 +0000
+++ debian/update-grub-legacy-ec2 2012-05-28 22:33:17 +0000
@@ -1401,7 +1401,7 @@
is_xen_kernel() {
case "${1}" in
*-ec2) return 0;;
- *-virtual)
+ *-virtual|*-generic)
# input is like /boot/vmlinuz-2.6.35-13-virtual
# get the version string out of it.
local ver=""

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1005551/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → cloud-init (Ubuntu)

On 29.05.2012 00:15, Ben Howard wrote:
> Oh...
>
> After cracking open a local copy of the image, it looks like linux-
> image-virtual has gone away. We are assuming that linux-image-virtual is
> going to be there.
>
Actually no. This may be a result of merging generic and virtual images together
as discussed at UDS. The linux-virtual meta package in theory should remain and
point to the appropriate generic kernel binary package.
But of course that may not be what the pvgrub config generator expects.

Ben,
  The other thing that likely needs fixing here is "copy-out-kernels" in the automated-ec2-builds.
  It will need to know that certain -generic (and possibly -generic-pae?) should be copied out as the desireable kenrel.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.6.3-0ubuntu2

---------------
cloud-init (0.6.3-0ubuntu2) quantal; urgency=high

  * Added -generic to Xen kernels list since -virtual has been dropped with
    Quantal. (LP: #1005551)
 -- Ben Howard <email address hidden> Tue, 29 May 2012 12:59:01 -0600

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Scott Moser (smoser) wrote :

verified:
$ cat /etc/cloud/build.info
build_name: server
serial: 20120531
$ ec2metadata --ami-id
ami-f4a9089d

#us-east-1 ami-f4a9089d ebs/ubuntu-quantal-daily-amd64-server-20120531

Marcin Juszkiewicz (hrw) wrote :

Bug reappears when linux-lts-quantal kernels are installed on 12.04 instances.

Scott Moser (smoser) on 2013-01-09
Changed in cloud-init (Ubuntu Precise):
status: New → Confirmed
importance: Undecided → High
Scott Moser (smoser) on 2013-01-09
summary: - Quantal does not boot on EC2
+ update-grub-legacy-ec2 ignores kernels named -generic
Scott Moser (smoser) wrote :

I've committed changes for this in a precise branch at lp:~smoser/ubuntu/precise/cloud-init/sru . I have a ppa build of that at https://launchpad.net/~smoser/+archive/cloud-init-test/ . Any testing on that would be appreciated.

The plan is to move SRU this as soon as the current SRU moves to -updates.

Marcin Juszkiewicz (hrw) wrote :

Przygotowanie do zastąpienia grub-legacy-ec2 0.6.3-0ubuntu1.4 (wykorzystując .../grub-legacy-ec2_0.6.3-0ubuntu1.5~ppa0_all.deb) ...
Leaving 'diversion of /usr/sbin/grub-set-default to /usr/sbin/grub-set-default.real by grub-legacy-ec2'
Rozpakowanie pakietu zastępującego grub-legacy-ec2 ...
Przetwarzanie wyzwalaczy dla man-db...
Konfigurowanie isc-dhcp-common (4.1.ESV-R4-0ubuntu5.6) ...
Konfigurowanie isc-dhcp-client (4.1.ESV-R4-0ubuntu5.6) ...
Konfigurowanie dhcp3-client (4.1.ESV-R4-0ubuntu5.6) ...
Konfigurowanie dhcp3-common (4.1.ESV-R4-0ubuntu5.6) ...
Konfigurowanie grub-legacy-ec2 (0.6.3-0ubuntu1.5~ppa0) ...
Searching for GRUB installation directory ... found: /boot/grub
Searching for default file ... found: /boot/grub/default
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz-3.5.0-22-generic
Found kernel: /boot/vmlinuz-3.5.0-21-generic
Found kernel: /boot/vmlinuz-3.2.0-36-virtual
Found kernel: /boot/vmlinuz-3.2.0-35-virtual
Found kernel: /boot/vmlinuz-3.2.0-23-virtual
Replacing config file /run/grub/menu.lst with new version
Updating /boot/grub/menu.lst ... done

so it works fine for me

Scott Moser (smoser) wrote :

just an update, I uploaded cloud-init with this fix to precise-proposed on 2013-01-31.
We're just still waiting on SRU team review.

Scott Moser (smoser) on 2013-02-19
description: updated

Hello Ben, or anyone else affected,

Accepted cloud-init into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/cloud-init/0.6.3-0ubuntu1.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Precise):
status: Confirmed → Fix Committed
tags: added: verification-needed
Scott Moser (smoser) on 2013-02-19
description: updated
Scott Moser (smoser) wrote :

Verified using latest precise released images in us-east-1 EC2.

# ami-0145d268 us-east-1/ebs/ubuntu-precise-12.04-amd64-server-20130204
$ ec2metadata --ami-id
ami-0145d268
$ ec2metadata --availability-zone
us-east-1b
$ dpkg-query --show grub-legacy-ec2
grub-legacy-ec2 0.6.3-0ubuntu1.4

$ sudo apt-get install linux-generic-lts-quantal -y -q
$ grep "3.5.[0-9]" /boot/grub/menu.lst || echo NOT_FOUND
NOT_FOUND
$ ls /boot/vmlinuz-3.*
/boot/vmlinuz-3.2.0-37-virtual /boot/vmlinuz-3.5.0-23-generic

$ echo "deb http://archive.ubuntu.com/ubuntu precise-proposed main" |
  sudo tee /etc/apt/sources.list.d/proposed.list
$ sudo apt-get update
$ sudo apt-get install grub-legacy-ec2
$ dpkg-query --show grub-legacy-ec2
grub-legacy-ec2 0.6.3-0ubuntu1.5
$ grep "^title.*3.5.[0-9]" /boot/grub/menu.lst || echo NOT_FOUND
title Ubuntu 12.04.2 LTS, kernel 3.5.0-23-generic
title Ubuntu 12.04.2 LTS, kernel 3.5.0-23-generic (recovery mode)

tags: added: verification-done
removed: verification-needed

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.6.3-0ubuntu1.5

---------------
cloud-init (0.6.3-0ubuntu1.5) precise-proposed; urgency=low

  * debian/update-grub-legacy-ec2: consider kernels bootable on ec2
    that are named -generic, in addition to -virtual. This fixes a problem
    where the kernels installed by linux-lts-quantal were not added to
    /boot/grub/menu.lst (LP: #1005551)
  * debian/patches/lp-1077020-fix-ca-certificates-blanklines.patch: fix
    adding of empty lines in ca-certificates file (LP: #1077020)
  * debian/patches/lp-1031065-nonet-not-start-networking.patch: do not 'start
    networking' in cloud-init-nonet upstart job. Doing so can cause networking
    to be started earlier than it should be. Instead, add a
    cloud-init-container job that runs only in a container and emits
    net-device-added (LP: #1031065).
  * debian/patches/lp-1037567-add-config-drive-v2-support.conf:
    backport support for config-drive-v2 which is part of Openstack Nova in
    Folsom and later. (LP: #1037567) (LP: #1100545)
 -- Scott Moser <email address hidden> Wed, 16 Jan 2013 19:37:57 -0500

Changed in cloud-init (Ubuntu Precise):
status: Fix Committed → Fix Released
no longer affects: linux-lts-quantal (Ubuntu)
no longer affects: linux-lts-quantal (Ubuntu Precise)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers