grub-pc needs some help in uec instances

Bug #623609 reported by Scott Moser on 2010-08-24
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
High
Scott Moser
Lucid
Undecided
Unassigned
Maverick
High
Scott Moser
grub2 (Ubuntu)
High
Colin Watson
Lucid
Medium
Unassigned
Maverick
High
Colin Watson

Bug Description

Binary package hint: grub2

The uec images are intended to "just work" in 6 different environments:

A.) ec2 with xen disk root device (/dev/sda1)
B.) UEC with scsi root device (/dev/sda1)
 1.) Booted via floppy that multiboots to (hd0,1)/boot/grub/core.img
 2.) Booted via '-kernel <linux_kernel>'
C.) UEC with virtio root device (/dev/vda1)
 1.) Booted via floppy that multiboots to (hd0,1)/boot/grub/core.img
 2.) Booted via '-kernel <linux_kernel>'
D.) nocloud, with virtio or scsi root device as /dev/sda or /dev/vda
  Ie, this is booting the partition image as if it were a disk.
  Booted via grub boot floppy that just finds root and /vmlinuz,
  /initrd.img and boots them

Currently, during our build process we trick grub-pc into installing as
if it were writing to /dev/sda (which is correct for 'B' above). That
is done by providing a psuedo grub-probe and running [1]
  grub-install --grub-setup=/bin/true /dev/sda && /usr/sbin/update-grub

By tricking grub-install into working, we get core.img and associated
files in /boot/grub that allow the bootloader to multiboot load and
boot the system. All of the A,B,C,D above boot first time correctly.

In A, B.2, and C.2 I'd like for grub-pc to generally just not
get in the way. B.2 and C.2 work just as well as B.1 and C.1 do, so
thats not too difficult.

In B.1, C.1 above, we want grub-pc to actually maintain
/boot/grub/core.img and other files. The installation to the MBR is
not necessary, It will be the floppy disk that multiboot loads
(hd0)/boot/grub/core.img rather than grub installed on the MBR.

My problems right now are:

1.) Currently update-grub is not set to run in /etc/kernel-img.conf
    because if it is, kernel installation will fail in 'A' (bug 610554)
2.) in C.1 and C2.2 above, on the first grub-pc upgrade, the user is
    prompted where to install grub to. I'd like to seed the answer with
    /dev/vda.

I'm fine with determining in cloud-init which of the situations is
present, and on the first boot making it so everything appears to "just
work". I'm just need to figure out what correct set of actions to take
is.

Notes:
- upgrade/install as mentioned above can be done with:
  'apt-get install --reinstall grub-pc'
- On EC2, grub-pc is not needed. grub-legacy-ec2 (from the cloud-init
  package) maintains /boot/grub/menu.lst, which is read by EC2's
  pv-grub. grub-pc is not needed, but is present to keep a single
  image.
- on EC2, there is no /dev/sda, only /dev/sdaX, and grub's searching
  for bios drives in grub-probe won't work, as there isn't really a bios.
  $ grep . /proc/partitions
  major minor #blocks name
   202 1 10485760 sda1
   202 2 156352512 sda2
   202 3 917504 sda3
  $ ls /dev/sda*
  /dev/sda1 /dev/sda2 /dev/sda3
  $ df / | grep -v Filesystem
  /dev/sda1 10321208 722144 9074788 8% /
  $ sudo grub-probe --device /dev/sda1
  grub-probe: error: cannot find a GRUB drive for /dev/sda1. Check your
  device.map.
- 'D' is by far the lowest concern, but it seems if I can get a fix
  for the others, I can probably manage something.

--
[1] http://bazaar.launchpad.net/~ubuntu-on-ec2/vmbuilder/automated-ec2-builds/annotate/head%3A/vmbuilder-uec-ec2-fixes

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: grub-pc 1.98+20100804-4ubuntu1
ProcVersionSignature: User Name 2.6.35-17.23-virtual 2.6.35.2
Uname: Linux 2.6.35-17-virtual i686
Architecture: i386
Date: Tue Aug 24 18:12:05 2010
Ec2AMI: ami-a2d339cb
Ec2AMIManifest: ubuntu-images-testing-us/ubuntu-maverick-daily-i386-server-20100824.manifest.xml
Ec2AvailabilityZone: us-east-1b
Ec2InstanceType: m1.small
Ec2Kernel: aki-407d9529
Ec2Ramdisk: unavailable
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: grub2

Related branches

Scott Moser (smoser) wrote :
tags: added: uec-images
removed: i386
Colin Watson (cjwatson) wrote :

Can you point me to the code that arranges for there to be /dev/sda1 etc. on EC2 but no /dev/sda? I'd like to trace through it and think about this a bit.

Note that grub-probe doesn't really talk to the BIOS. It's just iterating through OS devices.

Scott Moser (smoser) wrote :

Colin, I'm not particularly sure where that code is. My guess is in the xen code, and possibly partially in the xen pv block device driver in the kernel.

Xen has always had this strange but useful functionality of assigning a file or block device in the dom0 to a partition in the guest:

Ie, you can do:
 Host : Guest
 /dev/sdb : /dev/xvda
 /dev/sdb : /dev/xvda1
 /home/smoser/part.img : /dev/xvda1
 /home/smoser/swap.img : /dev/xvda2
 /home/smoser/ephemeral.img : /dev/xvda3

We dont get to control the xen setup of these devices. In EC2, you provide EC2 with a partition image, and they make that data available as the first disk's first partition.

I think in the past, guests used to see a /dev/sda entry (along with the sda1 entries), but it was mostly fictitious. Ie, you couldn't re-write the partition table.

Note, that xen block devices would normally be named xvdX, but our kernel renames them to sdaX. I suppose its possible that there is a bug in our kernel renaming that code that ends up dropping sda.

Scott Moser (smoser) wrote :

Above, where I said "in the past, guests used to see a /dev/sda entry", I meant in my experience with xen prior to EC2. On EC2 I don't know that I have ever seen a /dev/sda entry. Its also possible I could be remembering incorrectly outside of EC2 also.

Scott Moser (smoser) wrote :

It seems to me that I could accomplish all of the goals listed above if:
a.) I could seed "grub-pc grub-pc/install_devices multiselect" with 'NONE' or some other indication of "don't worry about it".
b.) I can fix grub-mkconfig on ec2 to not exit failure.
  grub-mkconfig is called from update-grub, and it calls grub-probe.

Thierry Carrez (ttx) on 2010-09-02
Changed in grub2 (Ubuntu Maverick):
assignee: nobody → Scott Moser (smoser)
importance: Undecided → High
tags: added: server-mrs
Thierry Carrez (ttx) on 2010-09-07
Changed in grub2 (Ubuntu Maverick):
milestone: none → ubuntu-10.10
Scott Moser (smoser) wrote :

Some more info here. Again, the failure of 'update-grub' in an ec2 instance is because grub-mkconfig is called in update-grub. grub-mkconfig in one way or another ends up calling grub-probe, and grub-probe exiting failure.

The first call to grub-probe that is failing for me is from
 /etc/grub.d/10_linux ->
   /usr/lib/grub/grub-mkconfig_lib : prepare_grub_to_access_device()
  which runs
   grub-probe --device ${device} --target=drive
  At that point grub-probe is the result of '${grub_probe} --target=device /' (which is '/dev/sda1').

The reason for this failure I think is because there *is* no /dev/sda in our ec2 instances. The reason for that is 2 fold:
a.) a patch in the kernel renames xen devices from xvd* to sd*
b.) the xen para virtutal device that is named 'sda1' is not actually a partition at all, but a disk. It just has a funny name. (ie, in /sys you'll see that this is indeed a disk and not a partition).

I'm guessing its a bad assumption in grub code due to a or b above that is causing the failure.

Colin Watson (cjwatson) on 2010-09-08
Changed in grub2 (Ubuntu Maverick):
status: New → Confirmed
Colin Watson (cjwatson) wrote :

I have a patch for grub-probe on EC2.

For UEC: while there is a way to preseed install_devices to empty, if you do that then grub-install will never be run, which means that core.img will never be upgraded. I'd rather we didn't take that approach. Can you simply preseed grub-pc/install_devices to /dev/sda or /dev/vda as appropriate? grub-install seems to succeed without errors, so you shouldn't need to play any games with diversions.

Scott Moser (smoser) wrote :

Colin,
  Thanks for your help.
  I'll have cloud-init preseed appropriately on uec.

Colin Watson (cjwatson) wrote :

For EC2, since you don't need core.img as I understand it, you should be able to preseed this:

  grub-pc grub-pc/install_devices string
  grub-pc grub-pc/install_devices_empty boolean true

Colin Watson (cjwatson) wrote :

Sorry, that 'string' should be 'multiselect', strictly.

Colin Watson (cjwatson) on 2010-09-10
Changed in grub2 (Ubuntu Maverick):
assignee: Scott Moser (smoser) → Colin Watson (cjwatson)
Colin Watson (cjwatson) wrote :

grub2 (1.98+20100804-4ubuntu6) maverick; urgency=low

  * Handle partition devices without corresponding disk devices
    (LP: #623609).

 -- Colin Watson <email address hidden> Fri, 10 Sep 2010 18:35:49 +0100

Changed in grub2 (Ubuntu Maverick):
status: Confirmed → Fix Released
Thierry Carrez (ttx) on 2010-09-14
Changed in cloud-init (Ubuntu Maverick):
assignee: nobody → Scott Moser (smoser)
importance: Undecided → High
milestone: none → ubuntu-10.10
status: New → Confirmed
Scott Moser (smoser) wrote :

Fix released:
cloud-init (0.5.15-0ubuntu1) maverick; urgency=low

  * New upstream release.
  * fix /etc/fstab cloudconfig entries for t1.micro and
    change default fstab values for ephemeral0 to nobootwait (LP: #634102)
  * grub-legacy-ec2: do not write chainload for grub2 to menu.lst
    (LP: #627451)
  * seed grub-pc correctly so update-grub runs on ec2 or uec(LP: #623609)
 -- Scott Moser <email address hidden> Sun, 12 Sep 2010 15:23:39 -0400

Changed in cloud-init (Ubuntu Maverick):
status: Confirmed → Fix Released
Scott Moser (smoser) on 2010-12-03
Changed in grub2 (Ubuntu Lucid):
importance: Undecided → Medium
Scott Moser (smoser) wrote :

I'd like to have the grub2 fix here backported to lucid, so 'update-grub' could successfully run in lucid instances.

I tried a quick build of the lucid with '978_ubuntu_diskless_partitions.patch' applied, but grub-probe still fails like:
$ sudo grub-probe --device /dev/sda1
grub-probe: error: cannot find a GRUB drive for /dev/sda1. Check your device.map.

Scott Moser (smoser) wrote :

The cloud-init portion of this was fixed in lucid with cloud-init at version 0.5.10-0ubuntu1.5.
The grub-pc portion still exists, but is worked around in the UEC image build process at http://bazaar.launchpad.net/~ubuntu-on-ec2/vmbuilder/automated-ec2-builds/revision/208 .

That commit shows 2 options for working around it
a.) 'rm /boot/grub/video.lst' in the instance
   This is the solution that I ended up going with.
b.) sed -i 's,^#GRUB_TERMINAL=console,GRUB_TERMINAL=console,' /etc/default/grub .
  This solution ends up prompting the user on grub-pc reinstall or upgrade.

Changed in cloud-init (Ubuntu Lucid):
status: New → Fix Released

I can confirm that 'rm /boot/grub/video.lst' works on Ubuntu 10.04.1 LTS (Lucid). Thanks for the tip, Scott.

Hello Scott, or anyone else affected,

Accepted grub2 into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in grub2 (Ubuntu Lucid):
status: New → Fix Committed
tags: added: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 1.98-1ubuntu13

---------------
grub2 (1.98-1ubuntu13) lucid-proposed; urgency=low

  [ Colin Watson ]
  * Handle partition devices without corresponding disk devices
    (LP: #623609).

  [ Ken Stailey ]
  * Backport upstream patch to skip LVM snapshots (LP: #563895).
 -- Colin Watson <email address hidden> Fri, 20 Jan 2012 12:08:36 +0000

Changed in grub2 (Ubuntu Lucid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers