preseed with RAID and GPT fails to set bios_grub flag on first disk but does set it on second so install fails

Bug #566965 reported by Anton Altaparmakov on 2010-04-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
partman-auto (Ubuntu)
High
Colin Watson
Lucid
High
Colin Watson

Bug Description

Binary package hint: partman-base

Hi,

Note: I am not sure this belongs in partman-base but it definitely is in one of the partman-*...

Note: This is on latest Ubuntu Lucid 10.04 LTS. The time stamp on the installer is 20081029ubuntu100.

I am doing automated install during which two 2TiB disks are partitioned to have a bios boot partition on each disk (1MiB size), then three software RAID1 devices, one for /, 20GiB, one for swap 4GiB, and one for /var (rest of space, i.e. just under 2TiB).

The snippet from the automated installation config file is this:

# Configure MD sotfware RAID and partitioning.
preseed partman-basicfilesystems/choose_label string gpt
preseed partman-auto/method string raid
preseed partman-auto/disk string "/dev/sdb /dev/sdc"
preseed partman-auto/expert_recipe string "multiraid :: 1 1 1 free method{ biosgrub } . 20480 20480 20480 raid method{ raid } . 4096 4096 4096 raid method{ raid } . 20480 20480 -1 raid method{ raid } ."
preseed partman-auto-raid/recipe string "1 2 0 ext3 / /dev/sdb2#/dev/sdc2 . 1 2 0 swap - /dev/sdb3#/dev/sdc3 . 1 2 0 xfs /var /dev/sdb4#/dev/sdc4 ."
preseed partman-md/device_remove_md boolean true
preseed partman-md/confirm boolean true
preseed partman/confirm_write_new_label boolean true
preseed partman/choose_partition select finish
preseed partman/confirm boolean true
preseed mdadm/boot_degraded boolean true
# Install GRUB on the MBR of both disks.
preseed grub-installer/only_debian boolean true
preseed grub-installer/with_other_os boolean true
preseed grub-installer/bootdev string "(hd1,1) (hd2,1)"

The installation fails with the error (from syslog):

Apr 19 22:12:13 grub-installer: info: Installing grub on '/dev/sdb /dev/sdc'
Apr 19 22:12:13 grub-installer: info: grub-install supports --no-floppy
Apr 19 22:12:13 grub-installer: info: Running chroot /target grub-install --no-floppy --force "/dev/sdb"
Apr 19 22:12:13 grub-installer: /usr/sbin/grub-setup: warn: This GPT partition label has no BIOS Boot Partition; embedding won't be possible!.
Apr 19 22:12:13 grub-installer: /usr/sbin/grub-setup: error: embedding is not possible, but this is required when the root device is on a RAID array or LVM volume.
Apr 19 22:12:13 grub-installer: error: Running 'grub-install --no-floppy --force "/dev/sdb"' failed.

Looking at the state of the disks at that time shows that /dev/sdc is all correctly partitioned and /dev/sdc1 has the bios_grub flag set (parted shows the flag is set) BUT /dev/sdb1 does NOT have the bios_grub flag set!!!

So somehow the installer fails to set this flag on /dev/sdb1 but it does set it on /dev/sdc1.

As an experiment, when the error occurred, I went to the shell with Alt+F2, then did:

chroot /target /bin/bash
parted /dev/sdb
toggle 1 bios_grub
quit

Then went back to the installer (Alt+F1), then selected "Back" and then pressed "Enter" so that it repeated the step of trying to install grub. Grub then installed fine and the installation proceeded to completion and the system is now installed and working fine.

So clearly the only problem is that the bios_grub flag is not being set on the first disk in the raid setup.

This really needs to be fixed otherwise 10.04 LTS will not be very useful for people with large disks who want to use software raid and custom partitioning...

Thanks a lot in advance!

Best regards,

Anton

Anton Altaparmakov (aia21) wrote :

Forgot to say that this is on 64-bit Ubuntu (amd64) and that the installation is started using DHCP+PXE booting and the pxelinux.cfg/default entry which kicks off the installation is:

LABEL install
 kernel ubuntu-installer/amd64/linux
 append vga=normal initrd=ubuntu-installer/amd64/initrd.gz ks=http://bes.csi.cam.ac.uk/kickstart/ks.cfg --

Where the ks.cfg is the config file I have attached to this bug report.

Colin Watson (cjwatson) wrote :

Thanks for the information. I'm looking into this.

Changed in partman-base (Ubuntu):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → High
status: New → Confirmed
Colin Watson (cjwatson) wrote :

I can deduce *what* is happening from the logs - the 'method' file is going missing from partman's internal data for the partition that is to become /dev/sdb1 - but not why. Fortunately this reproduces easily in a KVM instance, so I'll take it from there.

Changed in partman-base (Ubuntu):
status: Confirmed → In Progress
Colin Watson (cjwatson) wrote :

So, what's happening is roughly as follows:

  * RAID partitioning - or, in general, any multi-disk partitioning - operates by running /bin/autopartition on each of the disks in turn.
  * /bin/autopartition expands the recipe into a usable internal form and then calls /bin/perform_recipe, which has 'clean_method' (a function in /lib/partman/lib/recipes.sh) as one of its steps.
  * clean_method removes any old method files from all partitions on *all* disks. This makes sense in isolation, when you're autopartitioning a single disk and don't want to have stale information lying around, but not when autopartitioning multiple disks.

The upshot is that only method files from the last disk are taken into account, which is of course bad and results in the symptom you're seeing. I'll see if I can figure out a way to fix this as non-intrusively as possible; I agree that if possible it should be fixed for 10.04 LTS.

Thanks for your detailed report, which made it much easier to reproduce and diagnose this!

affects: partman-base (Ubuntu) → partman-auto (Ubuntu)
Changed in partman-auto (Ubuntu Lucid):
milestone: none → ubuntu-10.04
Colin Watson (cjwatson) on 2010-04-21
Changed in partman-auto (Ubuntu Lucid):
status: In Progress → Fix Committed
Anton Altaparmakov (aia21) wrote :

Hi Colin,

Thanks for the quick fix. Can you tell me which build of the package it is in once it is in the official Ubuntu Lucid tree and I will do another install (I have a second identical server waiting to be built and I can wait another day or two before doing it so we can make sure it is working now).

Best regards,

Anton

Changed in partman-auto (Ubuntu Lucid):
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Anton Altaparmakov (aia21) wrote :

(sorry about the status change - I thought clicking on it might bring up further details and instead it changed it, oops - I changed it back)

Colin Watson (cjwatson) wrote :

This bug will be automatically closed once the source package is accepted; it will probably then take a couple of hours to build and publish, and the next daily build after that should contain it. I expect this to be no earlier than Friday, possibly Saturday, as we're currently frozen for the RC.

Anton Altaparmakov (aia21) wrote :

Ok, thanks. I will try an install on Monday, hopefully it will be in by then.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package partman-auto - 89ubuntu7

---------------
partman-auto (89ubuntu7) lucid; urgency=low

  * Run clean_method before starting autopartitioning instead of in the
    middle of performing a recipe, and call autopartition just once for
    multi-disk partitioning. This means that methods applied to partitions
    of physical disks in RAID recipes are applied to all disks rather than
    just the last one (LP: #566965).
 -- Colin Watson <email address hidden> Wed, 21 Apr 2010 10:43:22 +0100

Changed in partman-auto (Ubuntu Lucid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers