Degraded Raid 6 boot fails with "invalid arch independent ELF magic" or other errors

Bug #960322 reported by Ian Macintosh on 2012-03-20
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Undecided
Unassigned

Bug Description

To replicate: Create or build a server with sufficient drives for a Raid 6 array, remove one of the raid drives, and try to boot.

In my case I created an ESXi 5.0 VM with Ubuntu Oneiric 11.10 64bit Server (ubuntu-11.10-server-amd64.iso) with 1 GB RAM, and 4 x 3GB drives. Manual setup, partition the 4 drives, Raid 6 them, create an LVM vg with 0.5G /boot, 1G swap and the balance on /.

It installs and reboots without an issue. I shut it down, removed /dev/sda and tried to boot.

Fails with "invalid arch independent ELF magic"

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: grub2 1.99-12ubuntu5
ProcVersionSignature: Ubuntu 3.0.0-12.20-server 3.0.4
Uname: Linux 3.0.0-12-server x86_64
ApportVersion: 1.23-0ubuntu3
Architecture: amd64
Date: Tue Mar 20 15:30:10 2012
InstallationMedia: Ubuntu-Server 11.10 "Oneiric Ocelot" - Release amd64 (20111011)
ProcEnviron:
 LANGUAGE=en_GB:en
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: grub2
UpgradeStatus: No upgrade log present (probably fresh install)

Ian Macintosh (ian-macintosh) wrote :
Phillip Susi (psusi) wrote :

Please run sudo dpkg-reconfigure grub-pc and make sure you have installed grub to all of the drives.

Changed in grub2 (Ubuntu):
status: New → Incomplete
Ian Macintosh (ian-macintosh) wrote :

I had done that already. But I redid it one more time anyway to be 100% certain. I then removed /dev/sda and it failed as given above. No change.

But then I decided to test by removing different drives, not just /dev/sda.

If I remove /dev/sdb, the exact errors given are:

error: fd0 read error.
error: unknown LVM metadata header.
error: fd0 read error.
error: no such disk.

The "error: fd0 read error." messages are normally present and not relevant.

At grub rescue>, "set" gives
prefix=(vg-boot)/grub
root=vg-boot

and "ls (vg-boot)/grub" as expected by the LVM error above, gives
error: no such disk.

If I remove /dev/sdc, in addition to the normal "fd0" messages, I only get
error: invalid arch independent ELF magic.

If I remove /dev/sdd, it boots perfectly fine, except for the degraded raid boot notices (I have "bootdegraded=yes" on the kernel command line).

So it appears that the behaviour is completely dependent on which drive of the raid set has failed.

Slightly strange behaviour after reconnecting /dev/sdd, grub flashed up the message "Invalid environment block" as it booted, but then it started up fine. I had to 'mdadm --add /dev/md0 /dev/sdd1' to get the array back into full operation after startup of course.

I think I should retest the entire scenario, but remove LVM from the equation. Will post results in a bit.

Ian Macintosh (ian-macintosh) wrote :

Fresh install with 4 x 3GB drives (zeroed), 3 primary partitions, 512m, 512m & balance (2.2g) configured md0, 1 & 2. Allocated to /boot, swap and / respectively and filesystems EXT4.

apt-get update, apt-get upgrade, apt-get dist-upgrade, reboot, dpkg-reconfigure grub-pc & set "bootdegraded=true", shutdown & then remove drives sequentially and attempt boot.

Removed /dev/sda = "error: invalid arch independent ELF magic"
Removed /dev/sdb = "error: file not found"
Removed /dev/sdc = no visible errors, grub menu displays briefly, apparently attempts to load Linux but reboots back to BIOS POST within seconds.
Remove /dev/sdd = boots degraded & /proc/mdstat displays degraded raid6 with /dev/sddX missing on all arrays

So, small differences in behaviour, but essentially the same. LVM appears to be unrelated to this bug.

summary: - Degraded Raid 6 boot fails with "invalid arch independent ELF magic"
+ Degraded Raid 6 boot fails with "invalid arch independent ELF magic" or
+ other errors
Ian Macintosh (ian-macintosh) wrote :

I spotted this on gnu.org

http://savannah.gnu.org/bugs/?35843

Virtually identical though he is using GPT. Seems like a definite grub2 1.99 bug.

Anyone got a PPA for grub 2.00-beta2 or later?

Phillip Susi (psusi) wrote :

The invalid arch independent ELF magic error is generally caused by trying to boot the bios grub MBR, while having the EFI grub installed in /boot, or vice versa. Can you run this script and post the output:

http://sourceforge.net/projects/bootinfoscript/

Ian Macintosh (ian-macintosh) wrote :

Output from boot_info_script.sh

Ian Macintosh (ian-macintosh) wrote :

This is another fresh Oneiric install, this time to an 8 drive x 3G raid 6 with ext2 on /boot, reiserfs on / and 2g swap all on LVM.

If I remove /dev/sda the same error occurs as with the previous incarnations.

Are you having trouble replicating this?

Every time a coconut for me.

Phillip Susi (psusi) wrote :

Hrm.. that looks fine. I'm starting to think there may be a bug in the grub raid6 driver.

Changed in grub2 (Ubuntu):
status: Incomplete → New
Bekir Dogan (bekirdo) on 2012-07-20
security vulnerability: no → yes
security vulnerability: yes → no
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in grub2 (Ubuntu):
status: New → Confirmed
Tony Travis (ajtravis) wrote :

Same problem booting from a degraded RAID6 under Ubuntu 12.04.3 LTS.
RAID6 is total 4TB, with an MSDOS disk label (first partition is 2TB)
Booted fine when fully sync'ed, but failed to boot when degraded.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.