grub-installer fails to install on a raid1 array

Bug #527401 reported by Mathias Gug on 2010-02-24
158
This bug affects 28 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
High
Colin Watson
Lucid
High
Unassigned
partman-base (Ubuntu)
High
Colin Watson
Lucid
High
Colin Watson

Bug Description

Binary package hint: grub-installer

While trying to install a raid5 with /boot on raid1 system using preseed, grub-install fails to install correctly:

Feb 24 21:43:59 grub-installer: info: Installing grub on '/dev/vda /dev/vdb /dev/vdc'
Feb 24 21:43:59 grub-installer: info: grub-install supports --no-floppy
Feb 24 21:43:59 grub-installer: info: Running chroot /target grub-install --no-floppy --force "/dev/vda"
Feb 24 21:44:00 grub-installer: /usr/sbin/grub-probe: error:
Feb 24 21:44:00 grub-installer:
Feb 24 21:44:00 grub-installer: no mapping exists for `md0'
Feb 24 21:44:00 grub-installer: .
Feb 24 21:44:00 grub-installer: Auto-detection of a filesystem module failed.
Feb 24 21:44:00 grub-installer: Please specify the module with the option `--modules' explicitly.
Feb 24 21:44:00 grub-installer: error: Running 'grub-install --no-floppy --force "/dev/vda"' failed.

I've attached the installation syslog run with DEBCONF_DEBUG set to developer.

Mathias Gug (mathiaz) wrote :
Mathias Gug (mathiaz) wrote :
Mathias Gug (mathiaz) wrote :
tags: added: iso-testing
Colin Watson (cjwatson) on 2010-03-15
affects: grub-installer (Ubuntu) → grub2 (Ubuntu)
Colin Watson (cjwatson) on 2010-03-15
Changed in grub2 (Ubuntu Lucid):
status: New → Triaged
importance: Undecided → High
milestone: none → ubuntu-10.04-beta-2
assignee: nobody → Colin Watson (cjwatson)

Just ran into this this morning when test installing a server with beta1 - would be very nice to have this fixed for this release

Yes, it is the very top thing on my list and it will be fixed.

Colin Watson (cjwatson) wrote :

I found that after I'd fixed this the initramfs failed to load md modules reliably before trying to bring up the array. I'm creating a bug task for this, and will have a look after I've uploaded the grub2 fix (at which point it will be easier to debug this sort of thing).

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 1.98-1ubuntu2

---------------
grub2 (1.98-1ubuntu2) lucid; urgency=low

  * Fix LVM/RAID probing in the absence of /boot/grub/device.map
    (LP: #525085, #527401).
  * Backport my upstream patch to copy .mo files from /usr/share/locale to
    match where 'make install' puts them (LP: #537998).
  * Look for .mo files in /usr/share/locale-langpack as well, in preference
    (LP: #537998).
  * Don't generate /boot/grub/device.map during grub-mkconfig (we already
    suppressed it during grub-install, but then grub-mkconfig generated it
    shortly afterwards, producing confusing results).
  * Don't run /etc/grub.d/README, even if it somehow ended up being
    executable (LP: #537123).
 -- Colin Watson <email address hidden> Mon, 22 Mar 2010 19:57:10 +0000

Changed in grub2 (Ubuntu Lucid):
status: Triaged → Fix Released
Kees Cook (kees) on 2010-03-23
Changed in mdadm (Ubuntu Lucid):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → High
milestone: none → ubuntu-10.04-beta-2

Wow - great work!

Verified that grub now installs on RAID1 system without errors.

Have not yet verified that it boots correctly as I'm running into some
graphics issues at boot-time but I suspect this is unrelated.

Thanks,

-stephen

--
Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie
Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway)

Colin Watson (cjwatson) wrote :

The problem with mdadm in the initramfs was as follows:

The MD 0.90 metadata format operates by putting a superblock at the *end* of the physical volume. (This is arguably a little silly and the 1.x formats work differently, but GRUB doesn't support them yet so they're not an option.) This means that when mdadm starts up, it's told to just look at everything in /proc/partitions, and it finds that both /dev/sda and /dev/sda1 appear to be the same physical volume; it's not then smart enough to say "oh, I'll take the partition then".

The reason I suddenly ran into this is that I'm using exact-number-of-MiB disks as a result of using kvm (though maybe most disks are exact-MiB? I'm not sure) and so the change to use MiB alignment by default means that the end of the partition lands exactly on the end of the disk. Leaving a small gap at the end fixes this. Therefore, until such time as we can move to the newer mdadm metadata format, I think the right answer is for partman to ensure that there's always a gap at the end of the disk.

affects: mdadm (Ubuntu Lucid) → partman-base (Ubuntu Lucid)
Changed in partman-base (Ubuntu Lucid):
status: New → In Progress
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package partman-base - 139ubuntu2

---------------
partman-base (139ubuntu2) lucid; urgency=low

  * Always leave a small gap at the end of the disk (except on device-mapper
    devices), to avoid confusing mdadm by leaving the MD 0.90 superblock at
    the end of the disk as well as the end of a partition (LP: #527401).
 -- Colin Watson <email address hidden> Fri, 26 Mar 2010 11:32:01 +0000

Changed in partman-base (Ubuntu Lucid):
status: In Progress → Fix Released
Hosed (liveonaware) wrote :

I have a server Raid1 with 2 Seagate Hard disks, I installed Karmic on them, stable release, it didn't finish installation because it says grub won't install, is this fixed now on 10.04 LTS? Should I install now when it's released? I'm still using Jaunty on my server.

That could be what I fixed, or could be something else. Please try
10.04 LTS Beta 2 when it's released, and if it still fails then please
file a new bug report.

RJARRRPCGP (rjarrrpcgp) wrote :

Objection. Not the exact bug! My problem is before I even get to the GRUB installation stage!

It's the partitioning.

RJARRRPCGP (rjarrrpcgp) wrote :

The error messages I get are not the same and apparently, it's before it even starts copying any files!

Colin Watson (cjwatson) wrote :

I've reverted the incorrect duplication; see my comment in bug 549258.

I can confirm that the updates work.

I have installed a RAID1 system up to the point that GRUB errors out.
Then I executed a shell, chrooted to the target system and updated the packages (including grub).
After the updates were installed, I exited back to the installer, installed the Grub bootloader using the menu item and it worked like a charm, no error message at all.

(I started the installation from an Alternative disc in Expert mode.)

Scratch my earlier comment. I've seen so many strange things this week during installation procedures, that my conclusion is not at all reliable.

Mark Foster (fostermarkd) wrote :

Following up on comments in 532729
Tried this (again) on Lucid Beta2 and it gets much further - the grub-install step does not FAIL like it had before.

However on subsequent boot of the machine it brings up a busybox prompt as shown in the attachment

Mark Foster (fostermarkd) wrote :

Addendum to my last comment... after rebooting from busybox the system did come up. So it looks as though the grub-install problem is fixed here.

Rolf Bensch (rolfbensch) wrote :

I just installed Lucid Beta2 Server with separated Raid1 for /boot. Grub has been installed on both hard disks.

While creating the md devices, the newly added superblocks are available as free raid devices. To avoid faulty configurations please hide them.

Dan Stoner (danstoner) wrote :

Installed Lucid 10.04 beta 2 from Alternate CD. I was able to mirror (RAID 1) /boot on 2 TB drives via the installer. I noticed a quick grub install message flash on the screen to show that grub was being installed to both /dev/sda and /dev/sdb.

System appears to boot normally.

Alvin (alvind) wrote :

@Mark Foster.

What happens if you reboot the server a few times. Does it always work, or do you sometimes get the busybox? (i'm asking because this might be a separate bug.)

Sergei Vorobyov (svorobyov) wrote :

Still (in server x64 beta2) partitioning & raid building do not work correctly during installation.
I had 3 identical HDs, with identical /sda1 /sdb1 /sdc1 of 100MB each in the
beginning of the corresp. disk. Trying to build RAID1 with 2 active mirrors /sda1 and /sdb1
and spare /sdc1, the partitioner does not see (does not offer) to use /sdc1
as a spare. And there's no way to force it to, except adding a spare after the install.

Sergei Vorobyov (svorobyov) wrote :

Confirmed: I retried it erasing all the partitions of the previous installation and recreated
all the partitions from scratch, all of the same size and primary:

/sda1 /sda2 ...
/sdb1 /sdb2 ...
/sdc1 /sdc2 ...

with the intention to build raids1 on /md0 (/sda1, /sdb1, /sdc1 - spare) and
/md1 (/sda2, /sdb2, /sdc2 - spare) for dual boot of U10.04 and CentOS 5.4

/md0 was OK, but building /md1 the installer/partitioner do not offer (do not see)
/sdc2 to be used as a spare.

Sergei Vorobyov (svorobyov) wrote :

Looks like I found a remedy (which may also indicate where the error is):

1. Create RAIDs one at a time, finish after each one.
2. It says "Starting the partitioner" (apparently it writes and rereads tables)
3. Select "Configure SW RAID", etc.

Takes more time (each "Starting ..." takes a minute) but apparently works fine.

Sergei Vorobyov (svorobyov) wrote :

Another error :-(

In the end of the partitioning it says

ERROR!!!

partition length 5655133440 exceeds the loop-partition-table
imposed maximum of 4294967295

which is strange. The 3 disks I have are just 2TB each, and I am just
trying to create a RAID 0 of three 967GB partitions /sd{a,b,c}8 on each of them
totaling 2.7TB

Sergei Vorobyov (svorobyov) wrote :

Another nasty feature:

it keeps creating "unusable" spaces after every raid /md*, between 100-200KB,
not a big deal, but nevertheless...

Sergei Vorobyov (svorobyov) wrote :

buggy as hell: after installation finished OK (with grub installed to
mbr of /sda, /sdb), the system cannot boot:

Grub loading stage 1.5

Grub loading, please wait...
Error 17

Sergei Vorobyov (svorobyov) wrote :

moreover, the "Rescue/repair" option on the server disk is buggy.

1. Asks the name of the system
2. Asks to select a timezone from a mile-long list
3. Suggests to select a root filesystem, with no apparent effect
4. When you pick up "install grub" it leads you through the whole disk partitioning

Bottom line: avoid Ubuntu at all costs

Sergei Vorobyov (svorobyov) wrote :

1. Ubuntu 10.04 beta2 x64 server (installed from CD) would not boot from raid1; see above.
2. It would not boot even if you destruct your RAID1 (into 3 pieces in my case) and ask it to install on a simple non-raid ext4 partitions that result.

Workaround:

a) boot from any LiveCD, or rescue CD
b) mount your intended root on /mnt, intended boot on /mnt/boot, /dev, /proc, /sys, /usr
to /mnt/... (resp)
c) chroot /mnt
d) grub-install /dev/sda (and the same for /sdb, /sdc, just to be on the safe side)

Elementary, Watson.

The biggest question is: why can't they implement it in the installer? (We are talking version 10.04).
If the installer is that bad, what could one expect of the rest? (not speaking about how this installer looks,
probably from 60-70's).

Filesystems used?
Are you doing it by hand or preseeding?
I just use the generic recipe from the README contained in the
partman-auto-raid and it works (as I described earlier, it bombed once
afterwards but was fine after that).

Just to re-iterate, I succesfully installed ubuntu 10.04 lts beta2 to a
RAID1 config without any problems after the initial bug identified here
was fixed. The grub installer also seemed to install copies of itself to
both drives which is a plus (I've been manually doing that in the past).

Keep up the good work.

-stephen

--
Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie
Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway)

Sergei Vorobyov (svorobyov) wrote :

I did it manually, using ext4, first on raid1 (2 active + 1 spare), which failed to boot, after
several attempts. Rescue would not help.

Then I destructed raid1 and manually (see above) grub-installed on /sda1, /sdb1, sdc1.
Doing the same with raid1 does not worth the time to load a bunch of modules.

Besides, the installer refused to build a RAID0 of 2.7TB out of 3 900GB partitions
(table too big, see above).

As I said, I did not believe it. I easily did it using a simple mdadm --create after installation.

Rewrite from scratch

Colin Watson (cjwatson) wrote :

Sergei, could you please file a new bug about this (just in case it's different from the original, which it might well be - it's easier to mark it as a duplicate later than to try to split it up) and attach the 'syslog' and 'partman' installer logs - you can extract them using "Save debug logs" from the installer's main menu.

Sergei Vorobyov (svorobyov) wrote :

One of the issues ("too large" 2.7gb raid the installer refuses to create) is related to/same as

https://bugs.launchpad.net/bugs/543838

I stupidly made a remote update (kernel included) with reboot, and the system did not come up.
Apparently, kernel updates modify MBR. Now I am locked from my server, so my logs will be delayed until
Monday.

What would be better: configure SATA disks as IDE or AHCI?

The best, I think, is to add a small IDE disk to boot from (rather than booting from raid). This will never fail.

Sergei Vorobyov (svorobyov) wrote :

I repeated the installation experiment twice, this time with 3 SATA disks on AHCI (yesterday
they were on IDE):

1. with 10.04-beta2-server-amd64

2. with the daily server-amd64 (18/4/2010)

getting basically the same results for /boot on raid1.

After 1) the system booted once but stopped booting after the update
After 2) the system would not boot at all.

I am attaching the partman and syslog from /var/log/installer for 2)

I have to say that the Ubuntu rescue is indecent: no file completion in the shell
no history, cannot get back to the previous command, even no ssh and scp (to send logs),
briefly, pre-msdos style.

The partitioner still:

1. complains about "too big" partition table for a 2.7TB raid0 of 3 0.9TB chunks
2. leaves "unusable" space in the end of every partition
3. makes odd rounding in the sizes of partitions you request (you ask 129MB, it makes 127.8mb instead of 129 in one of 3 identical disks)
4. failed to create one of the raid1 with 2 active and one spare: does not give an option (does not see) to pick the spare; it actually sees it in the sequel.

The installer also reports an error when you try to pick the cloud packages.
All this makes the installation pretty annoying.

Sergei Vorobyov (svorobyov) wrote :

syslog attachment to the previous message (this system cannot even take >1 attachments ;-)

Sergei Vorobyov (svorobyov) wrote :

Once again, the simple recipe in message #30 above allowed me to boot an unbootable system (forgotten grub-install error?)

Sergei Vorobyov (svorobyov) wrote :

Another ugly thing: fdisk -l complains "/dev/mdX does not contain a valid partition table" (see the attachment) about partitions created by partman. The system is nevertheless operational.

Sergei Vorobyov (svorobyov) wrote :

The server still does not reboot, saying a few times:

init: unreadahed-other main process (number) terminated with status 4

and freezing

Faulty everywhere...

Colin Watson (cjwatson) wrote :

I'm sorry, but I can't handle further comments in this bug; it is much too difficult to keep track. Could you please file a new bug report, as I requested in comment 34?

Sergei Vorobyov (svorobyov) wrote :

The partman in the server installer from the daily distribution 20100419.1 is even more erroneous.

I decided to simplify matters by installing a small extra IDE/ATA disk for the system
keeping the 3 x 2TB SATA disks for manual configuration (as soon as partman is raid
ignorant and buggy).

The installer correctly identified the disks as

sda 2TB
sdb 2TB
sdc 2TB
sdd 160GB

I created /root, swap, and /home on sdd, leaving 3x2TB disks unformatted.
Then during the final stage it reported "installing grub on MBR of sda" (leaving me no
options; why? I have not even touched sda)

Not surprisingly, the server cannot boot...
And I have not yet started testing what I wanted.

ceg (ceg) wrote :

The installer fails if there are already superblocks on your disks.

They get set (partly) assembled and the installer is not capable of showing or handling these.

Try writing zeros to your disk or mdadm --stop and --zero-superblock from another console.

pedja (poruke) on 2010-05-16
Changed in grub2 (Ubuntu Lucid):
assignee: Colin Watson (cjwatson) → nobody
Whit Blauvelt (whit-launchpad) wrote :

I'm seeing this bug in an install of 10.04.1, although this may be related to my creating the RAID 1 array before hand, since with the WD20EARS drives otherwise the cylinder alignment isn't right - or at least I can't see how to make it so.

Is there a work-around? Or do I have to entirely reinstall leaving (how much) extra space after the partitions? Or ...? I know a bug report isn't the perfect fit for talking about workarounds. Still, people do look for them here. I'm sure I'm not the only Ubuntu user who never uses Ubuntu's installer to set up partitions. I've also had cylinder alignment issues with it on true hardware RAID in the past. It's far less awkward to just boot and partition from something like System Rescue Disk, and then go to the Ubuntu install. Except it looks like for the current state of things I filled to disks too far?

Or what does it take to move to the "newer mdadm metadata format"? Is there a grub that _does_ support it?

Whit Blauvelt (whit-launchpad) wrote :

Oh, I see Colin's worked on it (http://www.listware.net/201007/grub-devel/64403-patch-mdadm-1x-metadata-support.html). Now to see if there's an accessible way to build/install and use.

Tim Hockin (thockin-hockin) wrote :

I've got this same problem, it seems.

I originally installed to /dev/sda. I later added 2 new disks. I manually (with the Ubuntu GUI, anyway) made a RAID1 of them, and have copied everything over to them.

Trying to install grub yields:

root@thdesktop:/# grub-install --modules="mdraid raid" /dev/md0
/usr/sbin/grub-probe: error: no mapping exists for `md0'.
/usr/sbin/grub-probe: error: no mapping exists for `md0'.
/usr/sbin/grub-setup: error: no mapping exists for `md0'.

root@thdesktop:/# update-grub
Generating grub.cfg ...
/usr/sbin/grub-probe: error: no mapping exists for `md0'.

root@thdesktop:/# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-2.6.32-25-generic

Looking at /dev/md0 (and the disks that comprise it) I see no trace of GRUB having been installed there.

Do I need to redo the partitions? Differently how? With how much dangling space?

Tim Hockin (thockin-hockin) wrote :

(My install is 10.04)

Lucas (lucascastroborges) wrote :

So fine, bug fixed! I'm using the last LTS and it works well.

164747 (jacquet-david) wrote :

I am still experiencing this bug. I tried to install Ubuntu 10.04 server on three discs sda, sdb, sdc. They were all partitioned into two primary partition, 2GB and "rest". I run a boot partition raid1 on the 2GB and a raid5->encrypted-. >logical volumes on the "rest"-parts.

Everything goes smoothly until the grub part. Screen goes res and says, grub-installed failed. I was offered by the installer to use "old" grub instead but the installation of this old grub also failed. Workaround for me (many hours (= later ) was to:

1. Press back
2. Go into shell (bottom in the menu)
3. mount --bind /proc /target/proc
4. mount --bind /dev /target/dev
5. chroot /target
6. bash
7. apt-get update
8. apt-get -y upgrade
9. apt-get install grub
10. exit
11. exit
12. Now go through the grub step, which now worked using the "old" grub

The system booted up nicely but I have not tried rebooting it since.

I think you should file a new bug and attach full installation logs
(/var/log/syslog and /var/log/partman; use "save debug logs" from the
main menu to extract them, or they're saved in /var/log/installer/syslog
and /var/log/installer/partman on the installed system). Your problem
may not have the same cause as this bug and it would be better to start
out without assuming that it does.

Andrei Tinca (andrei-tinca) wrote :

Hi, I'm trying to install 10.04.1 64 bit, and this bug is still open. Any plans to fix it?

linas (linasvepstas) wrote :

I'm hitting this bug, as described in comment #43 and comment 51

Running Lucid, on 64-bit,

grub-probe -t fs /boot
grub-probe: error: no mapping exists for `md1'.

grub-install --recheck --modules="mdraid raid" /dev/md1
/usr/sbin/grub-probe: error: no mapping exists for `md1'.
/usr/sbin/grub-probe: error: no mapping exists for `md1'.
/usr/sbin/grub-setup: error: no mapping exists for `md1'.

grub-install -v
grub-install (GNU GRUB 1.98-1ubuntu9)

linas (linasvepstas) wrote :

Oh, btw, I'm running mdraid version 1.0, not 0.9 and according to comment #9 above, this may be the reason I've got a problem. Am opening a new bug, to report this, as the above history is a bit long and confusing ...

linas (linasvepstas) wrote :

I opened bug 701351 to track the nearly-identical problem, as it occurs for mdraid version(s) 1.0, 1.1 and 1.2

Alex (mailatgoogl) wrote :

I see the same bug on lucid 10.04.3 LTS

Because I install system using FAI system , I just changed the default settings in /etc/mdadm/mdadm.conf , so mdadm uses metadata version 1.0
after that grub installs correctly and I can boot from the raid1 device

metadata 0.9 also works .

but with metadata 1.1 and 1.2 grub legacy fails .... Didn't check grub2

cat /etc/mdadm/mdadm.conf
# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes metadata=1.0

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers