Bug #569900 “partman sometimes creates partitions such that ther...” : Bugs : partman-base package : Ubuntu

Dustin Kirkland  (kirkland) on 2010-04-25

Changed in mdadm (Ubuntu):
importance:	Undecided → High
Changed in mdadm (Ubuntu Lucid):
milestone:	none → ubuntu-10.04

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-04-25:

#1

Okay, a little more information ...

Looks like the installer is trying to "partition" /dev/md0 (and comes up with this /dev/md0p1). This doesn't seem to work very well at all.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-04-25:

#2

Okay, so I finally managed to get Lucid installed on a RAID1 root disk, but the procedure isn't pretty...

* I booted a Desktop livecd
* Popped open a terminal
* partitioned both of my disks using fdisk, sda1 and sdb1, both 0xfd (linux raid)
* installed mdadm in the live cd environment
* mdadm --create /dev/md0 -n 2 -l 1 /dev/sda1 /dev/sdb1
* mkfs.ext4 /dev/md0
* then installed Ubuntu using the wizard
* after the install completed, drop to a shell and chroot to /target, and apt-get install mdadm

Note that the mkfs.ext4 step seems to be the critical one... If I don't do that, and I fire up the installer, it goes and tries to partition /dev/md0, yielding a /dev/md0p1 which is unusable. This seems to be the same thing the server installer did too.

Now, I'm up and running Ubuntu Desktop with / on a RAID1. Not Server. So this still isn't ideal. But this work around hopefully shows a bit more about where the bug is.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-04-26:

#3

Also related, I'm seeing some problems with the line "DEVICE partitions" in /etc/mdadm/mdadm.conf.

It seems that it interprets both /dev/sda and /dev/sda1 as separate "partitions" containing RAID superblocks. This seems wrong, to me, as only /dev/sda1 should be in consideration. This causes mdadm to start some funny, incorrect raid devices. This might be part of the cause of this bad behavior.

Revision history for this message

Imre Gergely (cemc) wrote on 2010-04-26:

#4

I'm just curious why it this works in a KVM environment and not on real hardware. I obviously know what the difference in in terms of "hardware" but from the POV of the raid it shouldn't be that much different, right?

Those disk are identical, there's no other sd* there to throw it off, it doesn't boot from sdb, nothing else fancy is going on? Did you try zeroing the two disks before install (like with dd if=/dev/zero of=/dev/sd{a,b}) ? Just to be sure.

Revision history for this message

ceg (ceg) wrote on 2010-04-26:

#5

Your probably hitting several bugs.

I have also seen some differences in the geometry of virtual drives (at least I tent to size them to some human friendly number). I have seen some "unused space" things and (c)fdisk complaining.

IIRC the most usefull way with md's partitionable array feature is to use the entire disk ("sdX") as members. Then you can partition the md device, and by this partition all mirror devices at the same time. If you create the array from one member + "missing" the partition table on the disk gets used for the raid. Mdadm will create only a configurable number of device nodes for partitions though.

Does blkid detect sdX wrongly as raid?

mdadm --incremental defaults to set up arrays with partitionable device nodes if auto= is not defined otherwise in mdadm.conf. Not a bad idea in general, but mdadm is only able to do so during initramfs. (map file is missing later). Also devices are not automatically removed from the array and block readdition by incremental. See Bug #495370.

Bug #551719 enabled kernel raid autodetection disturbs udev/mdadm (initramfs & later)

Revision history for this message

ceg (ceg) wrote on 2010-04-26:

#6

i.e. you'll get incomplete md_XpY devices if mdadm is not (re)installed (or you create a mdadm.conf manually) after creating the array since ages. Bug #252345

Revision history for this message

ceg (ceg) wrote on 2010-04-26:

#7

(Just to reassure you. I aggree a https://wiki.ubuntu.com/ReliableRaid could be seen as release critical for a LTS release, not only for a server/workstation edition.)

Revision history for this message

Serge van Ginderachter (svg) wrote on 2010-04-26:

#8

I just downloaded ubuntu-10.04-rc-server-amd64.iso and used it on a simple desktop machine where I added a second disk, both on SATA. First disk is around 500GB, second only 160GB.
Running the installer, manually partitioning, I made a partition of around 160GB on both disks, and left the other part of the 500GB one unused. Both 160GB partitions weren't exactly as big which left a small piece 'unused' as reported by the installer.
Further on I installed leaving most options default. One MD raid device on both 160GB partitions, formatting md0 as ext4 and putting / on it.

Reboot, no problem.

Revision history for this message

Imre Gergely (cemc) wrote on 2010-04-26:

#9

I have an older machine, with two identical 80GB IDE harddisks. Unfortunately it is not 64bit so I grabbed the 32bit server RC iso and installed on the two, with 1-1 big raid partition (sda1/sdb1 , and RAID1, without swap, and ext4 on it.

The exact steps:

- create bootable USB stick with 10.04-rc server 32bit on it (I've created this with Karmic)
- boot the system from the USB, install
- partitioned the two disks manually for sda1/sdb1 as RAID partitions (no swap)
- created md0 as a RAID1 raid
- md0 mounted as / (root partition) with ext4 (everything else left as default)
(aswered 'Yes' when asked if I wanted to boot in case of degraded RAID)
- installed the system
- reboot, everything is fine, checked /proc/mdstat and the RAID was indeed sync'ing, but it did boot and no mention of md0pX

A small note: I didn't see ANY kind of messages during the boot, no GRUB menu, no nothing, just a cursor in the upper left, then the login prompt. A bit strange ;)

Revision history for this message

Imre Gergely (cemc) wrote on 2010-04-26:

#10

Then I reinstalled with the 'boot degraded RAID' option set to 'No', and it's still working, no errors, no problems at booting.

Revision history for this message

Imre Gergely (cemc) wrote on 2010-04-26:

#11

Turns out my 'old' system supports 64bit, so I've reinstalled again, this time from ubuntu-10.04-rc-server-amd64.iso. Same setup as above, and it IS working as expected. It's sync'ing but it booted without problems (with the 'boot degraded RAID' option set to 'No').

I can't seem to reproduce this bug... for now.

Revision history for this message

Imre Gergely (cemc) wrote on 2010-04-26:

#12

Looking in /dev/disk/by-uuid I see one link pointing to ../../md0, and there's still no md0pX or anything else out of the ordinary.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-04-26:

#13

Imre and Serge-

Thanks... So this must be a disk geometry specific problem. (I actually went to Fry's and bought a new motherboard/cpu after the first few failures, thinking it was flaking out; same thing new cpu/motherboard).

My two disks are 500.1 GB SATA drives. One is Maxtor, the other Seagate.

Revision history for this message

Imre Gergely (cemc) wrote on 2010-04-26:

#14

I'll have two 750GB SATA drives tomorrow, if it's not resolved by then, I'll try with those.

Revision history for this message

ceg (ceg) wrote on 2010-04-26:

#15

What does blkid return before md's are set up? (maybe booting with break=top or break=premount)

Revision history for this message

ceg (ceg) wrote on 2010-04-26:

#16

also maybe try bootoption raid=noautodetect to get the kernel detection out of the way

Revision history for this message

Colin Watson (cjwatson) wrote on 2010-04-26: Re: [Bug 569900] Re: mount: mounting /dev/md0 on /root/ failed: Invalid argument

#17

On Mon, Apr 26, 2010 at 03:06:55AM -0000, Dustin Kirkland wrote:
> It seems that it interprets both /dev/sda and /dev/sda1 as separate
> "partitions" containing RAID superblocks.

This happens if /dev/sda1 extends all the way to the end of the disk. I
made a change last month to prevent this:

partman-base (139ubuntu2) lucid; urgency=low

  * Always leave a small gap at the end of the disk (except on device-mapper
    devices), to avoid confusing mdadm by leaving the MD 0.90 superblock at
    the end of the disk as well as the end of a partition (LP: #527401).

-- Colin Watson <email address hidden> Fri, 26 Mar 2010 11:32:01 +0000

Perhaps you can investigate why this apparently isn't working, based on this
history?

Revision history for this message

ceg (ceg) wrote on 2010-04-26: Re: mount: mounting /dev/md0 on /root/ failed: Invalid argument

#18

If you did not re-format you may also see Bug #527401. (Other install issues in the comments also).

Generally, if your disks already have some preexisting superblocks on them (even though you deleted the partition) blkid, partman, mdadm etc. can get confused. When you recreate (similar) partitions they are redetected and md devices set up.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-04-26: Re: [Bug 569900] Re: mount: mounting /dev/md0 on /root/ failed: Invalid argument

#19

ceg-

I found it best if I zero'd the entire disk between reinstallation attempts.

Revision history for this message

Imre Gergely (cemc) wrote on 2010-04-27: Re: mount: mounting /dev/md0 on /root/ failed: Invalid argument

#20

As I said on IRC, I've got the two 750GB disks and tried installing them on RAID1, but everything wen't fine, there were no problems at boot.
I've tried with KVM with 250GB disks also, no problems there either.

Can't reproduce.

Revision history for this message

HX_unbanned (linards-liepins) wrote on 2010-04-28:

#21

As of statements made regarding unreproducable bug, I am setting this bug to Invalid. Fell free to change to Incomplete if it occurs again ... after that, there might be regression testing made.

Changed in mdadm (Ubuntu Lucid):
status:	New → Invalid

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-01:

#22

I'm having the same issue installing 10.04 Server on a Supermicro 6013P-T using two identical 500GB Seagate drives. The installation proceeds fine, grub says it installs, and upon reboot I get this mount error:
mount: mounting /dev/disk/by-uuid/<uuid> on /root/ failed: Invalid argument

The UUID for md0 doesn't exist in /dev/disk/by-uuid in the initramfs shell. If I boot off the desktop LiveCD and install mdadm I can do a scan and detect both md0 and md1 (swap), and mount them. md0 is resyncing when I do this. fsck reports no errors on md0. I've spent countless hours on this, very frustrating.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-05-01:

#23

Thanks for confirming, Alex.

So one of my two drives is a Seagate 500GB (the other is a Maxtor). Maybe that will help us narrow the affected geometry.

Changed in mdadm (Ubuntu):
status:	Invalid → Confirmed
Changed in mdadm (Ubuntu Lucid):
status:	Invalid → Confirmed

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-01:

#24

I've now tried just installing on a single 500GB Seagate (different drive, was going to be a spare in my RAID1 array), and it won't boot either. This error is:
mount: mounting /dev/sda1 on /root failed: No such device

However in initramfs I'm able to mount /dev/sda1 /root with no problems. I don't understand, is it a controller issue? This box has SiL3112A controllers but I have the fakeraid disabled in the BIOS.

I'm sorry to report the same issue in a non-raid config, I hope that doesn't throw your original bug report off track. :(

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-03:

#25

I zero'd the drives using dd if=/dev/zero of=/dev/sd{a,b,c} bs=4k and still the system will not boot after install completes, with the same "mounting failed: invalid argument". I also have the md1p1 device as does Dustin.

If I boot into Live CD or even Recovery mode off of the 10.04 Server CD I can see and mount the md* devices with no problems. I do see that md0 is syncing.

Revision history for this message

Imre Gergely (cemc) wrote on 2010-05-03:

#26

@Alex: could you tell us the exact exact steps you took to install it? Like everything, every option, what you did, how you installed. Could it be that it doesn't like 500GB disks? :) I've installed on 80GB, 750GB, with RAID1, and found nothing. I have an 500GB WD too, and I can try that, but I don't have any fancy controller (I think). SATA is set to AHCI (as opposed to IDE) in the BIOS though, not sure if that matters.

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-03:

#27

I believe I accept all defaults up until partitioning. At that point I created automatic partitions on all 3 x 500GB SATA drives. I designed the first partition (sda1,sdb1,sdc1) on all 3 drives as bootable. I then create md0 setting sda1 and sdb1 as the active drives and sdc1 as the spare. I click Finish, then create md1 setting sda5 and sdb5 as active and sdc5 as spare. I click Finish, then set md0 to use ext4 and be the root filesystem, then set md1 to be swap. Partitioning finishes, files are installed, I select LAMP and OpenSSH Server for the software options. Finally I say Yes to install grub to the MBR, it appears to successfully install to sda and sdb. Install completes, CD pops out, and server reboots.

Revision history for this message

Imre Gergely (cemc) wrote on 2010-05-03:

#28

Automatic partitions? Could you try it manually, with only one disk, no RAID, one partition (sda1), ext4, no swap ?

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-03:

#29

I'm at work for the day, I'll try your suggestion tonight. Note that I have tried to install to a single disk, no RAID, though I did give it a swap partition and I also did not zero the drive (described in comment #24).

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-04:

#30

Imre, your suggestion worked. I zeroed the first several MB of the disk and performed the mdadm --zero-superblock as recommended by ceg in bug #527401. I did the install the same, except I manually created the partition and did not create swap.

I then tried installing on RAID1 again, the only difference being the addition of a swap partition. All other options were the same as the single drive install. It fails with the "Invalid Argument" error. :(

I'll try again with no swap partition.

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-04:

#31

And with no partitions the server won't even boot, I get an "operating system not found" message.

I give up, I've spent 20 or more hours across a dozen or more installs with 4 different hard drives in this server, and no version of Ubuntu from 9.10 to 10.04 Alpha 2 to now has been able to get RAID working on my server. I've been using various versions of Linux for more than 10 years, though rarely install the OS, so I'm not brand new to this and I've got an 8.10 server sitting next to this one that installed with RAID1 just fine over a year ago. Thanks for your suggestions.

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-04:

#32

Sorry, comment 31 should say "with no swap partitions".

Revision history for this message

Master Jason (jason-rsaweb) wrote on 2010-05-05:

#33

I am having the same problem with 2 x 500GB Seagate drives. I am in the process of installing 3 machines, each using 2 drives with software raid 1. Interestingly, the raid work first time on the machines with 1TB and 250GB drives, but not on the 500GB drives. After reading this post, i see the majority of people are having problemswith 500GB drives and Imre Gergely wrote on 2010-04-27 #20 that 750GB and his 250GB drives worked fine.

So i took the 500GB drives out of the server and put 2 x 250GB drives in to test .... it worked first time.
Anybody care to explain why this bugs seems limited to 500GB drives ...

Revision history for this message

Imre Gergely (cemc) wrote on 2010-05-05:

#34

@Jason: could you test the same without RAID ? Just a single 500GB drive ? Just to see if it's at all related to RAID or not. Just put one 500GB hdd in and install it. Then maybe take it out and install it again on the other 500GB drive, just to be sure. Delete the RAID stuff, maybe a little bit of zeroing first...

Revision history for this message

Master Jason (jason-rsaweb) wrote on 2010-05-05:

#35

Ok .... here is an update .... i have software raid working on the 2x 500GB drive.

Each drives has:
1) 2GB Raid
2) 490GB Raid (left off the last 8+GB)
3) 8GB free

md0 (2 x 2GB Raid) formated as /etx4 /boot
md1 (2x 490GB Raid) formated as LVM

LVM has -
4GB formated as swap
486GB formated as ext4 as /

Revision history for this message

Master Jason (jason-rsaweb) wrote on 2010-05-05:

#36

Hey Imre Gergely, sorry i missed your earlier post, single 500GB drive works fine.

Revision history for this message

Master Jason (jason-rsaweb) wrote on 2010-05-05:

#37

Ok .... here is another update ....

We tested another machine.
Intel GT mainboard with Q9400 CPU and 4GB Kingston RAM and 2 x 500GB Seagate Drives

Each drives has:
1) 2GB Raid
2) 498GB Raid (remaining space)

md0 (2 x 2GB Raid) formated as /etx4 /boot
md1 (2x 490GB Raid) formated as LVM

LVM has -
4GB formated as swap
486GB formated as ext4 as /

The installation went fine, but when booting i was informed that md0 was running in degraded mode (only 1 drive) and that the root partion could not be found. Then i was dropped into the initramfs shell. We used the Live CD to check the raid configuration, everything was 100%, md0 had both drives and had sync'ed and md1 had both drives and continued to re-sync. We could mount the drives and everything was as it should be.

We the reinstalled the machine. During the installation we removed the existing partition and re-partioned them as below:

Each drives has:
1) 2GB Raid
2) 495GB Raid
3) 3GB Free Space (approx)

md0 (2 x 2GB Raid) formated as /etx4 /boot
md1 (2x 490GB Raid) formated as LVM

LVM has -
4GB formated as swap
486GB formated as ext4 as /

The machine booted into Ubuntu first time.

I have no idea why i cant use the full drive.
We have tested the same with Ubuntu x64 8.04, 8.10 - it is working 100%
It is not working with Ubuntu x64 9.10, 10.04RC or 10.04

Just happy to have the machines working, i am sure i wont miss 3GB

Revision history for this message

Imre Gergely (cemc) wrote on 2010-05-05:

#38

So you're saying if you leave a bit of free space at the end of the drive, it all works fine but if you don't leave any space, it won't boot?

Revision history for this message

Master Jason (jason-rsaweb) wrote on 2010-05-05:

#39

Hey Imre,
It would seem that is the case.
What is the min amount of space? I have no idea.
Only have the results from the above tests.

Revision history for this message

Imre Gergely (cemc) wrote on 2010-05-05:

#40

Can you paste the output of "fdisk -l /dev/sda" and "hdparm -I /dev/sda | head -30" of the 500GB Seagate disk? You did install Ubuntu 10.04 server 64bit, right?

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-05:

#41

I gave up on the Supermicro and last night successfully installed on a newer Dell 860 with 2 x 250GB drives. I will be installing on my 500GB drives tonight and will let you know if the problem occurs there. I've also got two 750GB drives I can try.

Revision history for this message

Thomas Krause (krause) wrote on 2010-05-05:

#42

I've got the same problems on a Dell PowerEdge T110 with four 500.1 GB disks from Seagate (ST3500320NS) when trying to combine two of them to a RAID1.

I remember that there was a error in dmesg after falling back to the initramfs console about the ext4 filesystem (something with unexpected size) and a message like

md0p1 bad geometry block count exceeds size of device

(restored from my search history today).

I post the exact error message tomorrow, when I have access to the server again, if you like. I may also try some of the workarounds in order to get the server up and running tomorrow but since it's a replacement for an existing server I could delay the workarounds for some days in order to assist in debugging this issue.

Revision history for this message

Master Jason (jason-rsaweb) wrote on 2010-05-06:

#43

Hey Guys,

I can confirm it was Ubuntu x64 10.04 (ubuntu-10.04-server-amd64.iso)

root@xxxxxxxxx:~# uname -a
Linux xxxxxxxxx 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010 x86_64 GNU/Linux

root@xxxxxxxxx:~# fdisk -l /dev/sda

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00099c9c

Device Boot Start End Blocks Id System
/dev/sda1 * 1 244 1951744 fd Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2 244 59816 478515200 fd Linux raid autodetect

root@xxxxxxxxx:~# hdparm -I /dev/sda | head -30

/dev/sda:

ATA device, with non-removable media
        Model Number: ST3500418AS
        Serial Number: 9VMCY42L
        Firmware Revision: CC44
        Transport: Serial
Standards:
        Used: unknown (minor revision code 0x0029)
        Supported: 8 7 6 5
        Likely used: 8
Configuration:
        Logical max current
        cylinders 16383 16383
        heads 16 16
        sectors/track 63 63
        --
        CHS current addressable sectors: 16514064
        LBA user addressable sectors: 268435455
        LBA48 user addressable sectors: 976773168
        Logical/Physical Sector size: 512 bytes
        device size with M = 1024*1024: 476940 MBytes
        device size with M = 1000*1000: 500107 MBytes (500 GB)
        cache/buffer size = 16384 KBytes
        Nominal Media Rotation Rate: 7200
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, no device specific minimum

Revision history for this message

Alex Kuretz (akuretz) wrote on 2010-05-06:

#44

32-bit Ubuntu 10.04 for me. I performed the same install on 500GB drives that I had done successfully on the Dell 860 with 250GB drives last night, and it failed with the Invalid Argument message. As strange as it sounds, the 500GB drives is the only common denominator. I zeroed and ran mdadm ---zero-superblock on the drives prior to performing the install.

Revision history for this message

Thomas Krause (krause) wrote on 2010-05-06:

#45

The exact error message that I found using dmesg was

EXT4-fs (md1p2): bad geometry: block count 119655152 exceeds size of device (117213680 blocks)

I also had a

md1: detected capacity change from 0 to 490107502592

and

md1: p2 size 957241344 exceeds device capacity, limited to end of disk

The md1 was configured as a system root partition with Ext4 and md0 was a Swap (which seemed to work). For installation I followed the instructions at https://help.ubuntu.com/10.04/serverguide/C/advanced-installation.html

BTW, I also used the amd64 server version of Lucid.

Revision history for this message

Thomas Krause (krause) wrote on 2010-05-06:

#46

Ok, I think found the problem and a solution/workaround:

When you ask the installer to partition the free space on the disk it will prompt you with the size of the new partition. Per default this is "500.1 GB". If you first add a smaller Swap and then a second partition this will be smaller, but still something like "482.1 GB".

I you enter by hand "500GB" (or with multiple partitions something that the sum is not bigger than 500GB) than everything works fine and as it should.

Maybe 500.1 GB *is* the right number for the 500GB drive but I somehow doubt it and blame the installer for choosing a wrong default value ;-)

Revision history for this message

midair77 (midair77) wrote on 2010-05-07:

#47

I just tried to install 10.04 amd64 server on a box with 3 500G WD and 1 500G Hitachi Drives. I wanted to setup Raid 5 and then LVM for all the partitions. I used the latest ISO files.

Raid 5 for sda1,sdb1,sdc1,sdd1
LVM named system: /, /boot, /home, /var, /tmp, /usr, /usr/local, /opt

After installation and the box rebooted, I got error and initramfs prompt.

ALERT!! /dev/mapper/system-root does not exist. Dropping to a shell!

cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-2.6.32-21-server root=/dev/mapper/system-root ro quiet

cat /proc/mdstat and I saw that md0 raid 5 with sd[abcd] is resyncing...

ls /dev/mapper
control
ls /dev/md*
/dev/md0p1 /dev/md0

As you can see there is no system-root and etc under /dev/mapper.

I saw that in /etc/mdadm/mdadm.conf
ARRAY /dev/md0 level=raid5 num-devices=4 UUID=........

ls /dev/disk/by-*
/dev/disk/by-id/ /dev/disk/by-path

As you can see there is no /dev/disk/by-uuid in /dev/disk.

Previously, I encountered similar problems when installed on a system with 2 250GB WD disks (1 big Raid1 and then LVM all partitions). I then tried to set 3 Raid 1 small partitions for swap, / and /boot and 1 big Raid1 on LVM for other partitions. With this set up, 10.04 amd64 server was able to successfully booted up.

This is a very serious bug considering that a lot of people will be running server edition with some types of RAIDx and the installation failed to correctly set these up. This bug is a show stopper and makes users pondering the quality of each Ubuntu releases.

Revision history for this message

Manny Vindiola (serialorder) wrote on 2010-05-08:

#48

I can also confirm for Ubuntu 10.04 Lucid Server (64 bit)

I tried installing with a raid 1 on two WD Black 500GB hard drives.
I get the no init found error and am dropped to a busybox shell

I have three partitions one for /boot, one for /root, and one for /swap

Like @Thomas I also get this type error

md1: detected capacity change from 0 to 490107502592

and

md1: p2 size 957241344 exceeds device capacity, limited to end of disk

and like @midair77 I also get this type of error
AERT!! /dev/md1 does not exist. Dropping to a shell!

cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-2.6.32-21-server root=/dev/md1 ro quiet

I just took a look at my partition table and I think I noticed something that may be contributing to the problem:

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 fd Linux raid autodetect
/dev/sda2 14 59829 480468992 fd Linux raid autodetect
/dev/sda3 59829 60802 7813120 fd Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 fd Linux raid autodetect
/dev/sdb2 14 59829 480468992 fd Linux raid autodetect
/dev/sdb3 59829 60802 7813120 fd Linux raid autodetect

If you notice both HDDs have a total of 60801 cylinders and on both hard drives the end cylinder of the 3rd partition is 60802 which is greater than the size of the disk.

Both of these partitions were created with the install partitioner I am resinstalling now with partitions I create myself with (s)fdisk. I will let you know if that works.

I can also confirm for Ubuntu 10.04 Lucid Server (64 bit)

I tried installing with a raid 1 on two WD Black 500GB hard drives.
I get the no init found error and am dropped to a busybox shell

I have three partitions one for /boot, one for /root, and one for /swap

Like @Thomas I also get this type error

md1: detected capacity change from 0 to 490107502592

and

md1: p2 size 957241344 exceeds device capacity, limited to end of disk

and like @midair77 I also get this type of error
AERT!! /dev/md1 does not exist. Dropping to a shell!

cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-2.6.32-21-server root=/dev/md1 ro quiet

I just took a look at my partition table and I think I noticed something that may be contributing to the problem:

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   fd  Linux raid autodetect
/dev/sda2              14       59829   480468992   fd  Linux raid autodetect
/dev/sda3           59829       60802     7813120   fd  Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14       59829   480468992   fd  Linux raid autodetect
/dev/sdb3           59829       60802     7813120   fd  Linux raid autodetect

If you notice both HDDs have a total of 60801 cylinders and on both hard drives the end cylinder of the 3rd partition is  60802 which is greater than the size of the disk.

Both of these partitions were created with the install partitioner I am resinstalling now with partitions I create myself with (s)fdisk. I will let you know if that works.

Revision history for this message

Manny Vindiola (serialorder) wrote on 2010-05-08:

#49

I can confirm that creating the partitions by hand worked and the system is now able to boot. The current disk configuration is:

~# fdisk -l
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 fd Linux raid autodetect
/dev/sda2 14 59829 480468992 fd Linux raid autodetect
/dev/sda3 59829 60801 7813120 fd Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 fd Linux raid autodetect
/dev/sdb2 14 59829 480468992 fd Linux raid autodetect
/dev/sdb3 59829 60801 7813120 fd Linux raid autodetect

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2010-05-09:

#50

Colin-

We have a few people confirming this now. Any chance you could take another look?

Changed in mdadm (Ubuntu):
assignee:	nobody → Colin Watson (cjwatson)

Revision history for this message

Vladimir Smolensky (arizal) wrote on 2010-05-12:

#51

Same problems here and manually creating raids with mdadm seems to lead to same problems.

Looks like the raid driver thinks the last raid's superblock is actually the superblock for the entire disk!
In our case, we have /dev/md3 consisting of sda6, sdb6. The array was made manually with 'mdadm --create' AFTER installing the machine, not from installer! I've done this because partitioning the disk right from the installer always leaded to broken arrays after reboot.

After creating the array in the next boot we get md3 device assembled from sda and sdb and several other arrays created from
md3p1 md3p2 ... so on... a real mess
checking with 'mdadm --examine' shows that the metainfo for sda6, sdb6 is exactly the same as the one for sda, sdb! same components and UUID

If I remember right, the kernel should only auto-assemble arrays only from partitions of type FD ( Linux raid), so its unclear why it will assemble the array from sda and sdb!!

Revision history for this message

Vladimir Smolensky (arizal) wrote on 2010-05-12:

#52

And yes, our disks are 500Gb too

=== START OF INFORMATION SECTION ===
Device Model: WDC WD5001AALS-00E3A0
Serial Number: WD-WCATR0460005
Firmware Version: 05.01D05
User Capacity: 500,107,862,016 bytes

Revision history for this message

Daniel Spitaler (buz-buerotechnik) wrote on 2010-05-12:

#53

I had the same problem. An Ubuntu 10.04 installation with RAID1 on 2 500GB harddisks will not boot!

Thanks to the hint from Manny I created the raid-partitions with the installer manually and even let some space free at the end of each disk and it worked fine!

Revision history for this message

Vladimir Smolensky (arizal) wrote on 2010-05-12:

#54

Okay, my partition also seems to end at cylinder 60802 and the disk has 60801 cylinders...
I made the last partition by hand but its logical and the extended partition it is located on was made by the installer ending at 60802...

Dustin Kirkland  (kirkland) on 2010-05-12

Changed in mdadm (Ubuntu Lucid):
assignee:	nobody → Colin Watson (cjwatson)

Revision history for this message

Ky Weichel (kweichel) wrote on 2010-05-19:

#55

I can confirm this issue as well, on a Dell PowerEdge R210 with two 500GB drives.

Device Model: WDC WD5002ABYS-18B1B0
Serial Number: WD-WCASYC640636
Firmware Version: 02.03B04
User Capacity: 500,107,862,016 bytes

Device Model: WDC WD5002ABYS-18B1B0
Serial Number: WD-WCASYC631505
Firmware Version: 02.03B04
User Capacity: 500,107,862,016 bytes

The boot problem was the same (dumped to initramfs prompt on reboot after install). I found the same cylinder 60802 anomaly with the partition tables when I used the installer's partitioner to create them.

My partitions consisted of a 484GB main partition (marked bootable) and a 16GB swap partition on each disk, all set to type FD (Linux RAID autodetect) and added to /dev/md0 and /dev/md1 respectively. md0 was then formatted ext4 and mounted to /, and md1 was set to swap.

To workaround the issue, I created my partitions with fdisk instead (alt-switched to another tty during partitioning step) and then created my RAID sets in the installer's partitioner. As a result I could still create partitions that filled the whole drive. This worked and the system now boots properly.

My partition table looks like this:

root@belair-auto2:/# fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000b35fa

Device Boot Start End Blocks Id System
/dev/sda1 * 1 58843 472654848 fd Linux raid autodetect
/dev/sda2 58843 60801 15728160+ fd Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c1204

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 58843 472654848 fd Linux raid autodetect
/dev/sdb2 58843 60801 15728160+ fd Linux raid autodetect

(kweichel...md devices snipped)

I can confirm this issue as well, on a Dell PowerEdge R210 with two 500GB drives.

Device Model:     WDC WD5002ABYS-18B1B0
Serial Number:    WD-WCASYC640636
Firmware Version: 02.03B04
User Capacity:    500,107,862,016 bytes

Device Model:     WDC WD5002ABYS-18B1B0
Serial Number:    WD-WCASYC631505
Firmware Version: 02.03B04
User Capacity:    500,107,862,016 bytes

The boot problem was the same (dumped to initramfs prompt on reboot after install).  I found the same cylinder 60802 anomaly with the partition tables when I used the installer's partitioner to create them.

My partitions consisted of a 484GB main partition (marked bootable) and a 16GB swap partition on each disk, all set to type FD (Linux RAID autodetect) and added to /dev/md0 and /dev/md1 respectively.  md0 was then formatted ext4 and mounted to /, and md1 was set to swap.

To workaround the issue, I created my partitions with fdisk instead (alt-switched to another tty during partitioning step) and then created my RAID sets in the installer's partitioner.  As a result I could still create partitions that filled the whole drive.  This worked and the system now boots properly.

My partition table looks like this:

root@belair-auto2:/# fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000b35fa

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1       58843   472654848   fd  Linux raid autodetect
/dev/sda2           58843       60801    15728160+  fd  Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c1204

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1       58843   472654848   fd  Linux raid autodetect
/dev/sdb2           58843       60801    15728160+  fd  Linux raid autodetect

(kweichel...md devices snipped)

Revision history for this message

Mike Perry (mike.perry) wrote on 2010-05-21:

#56

Wow, I thought I was going crazy until I saw this bug. Here is my setup:

Two identical 500G Seagate drives. I sliced each of them up into 3 partitions using the alternative installer.

sda1, sdb2 - 1G - raid1 as md0 for boot
sda2, sdb2 - 5G - swap
sda3, sdb3 - remaining space as prompted - raid1 as md1 for LVM

The install appears to go fine, in fact, I didn't really notice there was a problem until I examined my boot logs and found that my swap on sdb2 wasn't found. When inspecting the drives I found that I had sda1,2,3, but there were no partitions on sdb. Instead, I have md1p1,p2,p3.

I'm surprised the system was even working. I experimented with a few different configurations and got similar results.

Revision history for this message

Mike Perry (mike.perry) wrote on 2010-05-22:

#57

I was able to get raid1 working directly from the install by following the advice in this thread. Specifically, I manually paritioned my drives with fdisk, and did not have any raid partition fill up the entire disk.

Revision history for this message

Ky Weichel (kweichel) wrote on 2010-05-23:

#58

Mike, if you're using fdisk, you actually _can_ have your partitions fill the whole disk. That's what I was trying to convey in my post.

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2010-06-01:

#59

I also can confirm that partitioning drives manually from another tty during installation solves the problem.

Revision history for this message

Lars Steinke (lss) wrote on 2010-06-04:

#60

While I encountered a similar problem when upgrading from 8.04 LTS, my observations might possibly prove helpful for fresh installs as well:
- 10.04 dropped me to the initramfs shell with: "ALERT!! /dev/md0 does not exist. Dropping to a shell!"
- I was then able to continue booting after issuing "mdadm --auto-detect; exit"
- A subsequent "grub-install /dev/md0" fixed the initramsfs problem with /dev/md0 for me.
Please note this is still grub 0.97, as that doesn't seem to be upgraded automatically...

Revision history for this message

Stu Thompson (stu-comp) wrote on 2010-06-11:

#61

I've also had the same issue, but with smaller disks: WD RE3 250GB (WD2502ABYS)

Using the suggested sizes when creating the md* devices (a single ext3 / partition + one swap partition) resulted with the "Invalid argument" error message and the initramfs prompt.

Manually defining the size to be slightly less than the suggested size worked like a charm.

Stu

Revision history for this message

RichardN (richardn) wrote on 2010-06-20:

#62

I have this exact same issue. 500.1GB seagate disks and 64bit 10.04 server. Have tried installing in a couple of different configurations with no luck. The first time I got the "mount: mounting /dev/disk/by-uuid/<uuid> on /root/ failed: Invalid argument" message and I noticed that gub-mkconfig was detecting a uuid that as far as I can tell did not exist on the system. So I changed /etc/default/grub so that I didn't pass uuid as a parameter. After that it wouldn't mount /proc.
The system works fine when installed on only one disk. I'll have one or two more goes at installing using some of the ideas here. If it still doesn't work I'll have to give up and use debian.

Revision history for this message

Frederic Van Espen (frederic-ve) wrote on 2010-07-02:

#63

Same issue here. I have 2 500 GB WD disks and was trying to install 10.04 server amd64. When partitioning with the installer partitioner, the ending cilinder of the last partition was set to 60802 while the disks physically only have 60801.

Manually partitioning with fdisk in another tty solved the issue for me.

Revision history for this message

Lobo (alex-loffler) wrote on 2010-07-04:

#64

Download full text (3.6 KiB)

Confirmed - after multiple install attempts resulting in /dev/md definition weirdness and a (initramfs) prompt, I finally have 10.04 installed on RAID1 /dev/md partitions. Tried the repartition manually just before installing - no joy. I wiped the disks clean then used the installer and manually set the last partition to be ~1GB smaller than the partitioner was suggesting and _bang_ it worked flawlessly. Here are some details about the setup - you guessed it 2 x 500GB Seagate disks...:

lobo@test:~$ uname -a
Linux test 2.6.32-23-generic #37-Ubuntu SMP Fri Jun 11 07:54:58 UTC 2010 i686 GNU/Linux

/dev/sda:

Model=ST3500320AS, FwRev=SD15, SerialNo=9QM7HLTS
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=no WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-4,5,6,7

/dev/sdb:

Model=ST3500320AS, FwRev=SD15, SerialNo=9QM63DNQ
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=no WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-4,5,6,7

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000693db

Device Boot Start End Blocks Id System
/dev/sda1 * 1 132 1060258+ fd Linux raid autodetect
/dev/sda2 133 394 2104515 fd Linux raid autodetect
/dev/sda3 395 60575 483397632 fd Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a1c27

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 132 1060258+ fd Linux raid autodetect
/dev/sdb2 133 394 2104515 fd Linux raid autodetect
/dev/sdb3 395 60575 483397632 fd Linux raid autodetect

lobo@test:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sda3[0] sdb3[1]
483397568 blocks [2/2] [UU]
[==================>..] resync = 94.6% (457681664/483397568) finish=10.1min ...

Confirmed - after multiple install attempts resulting in /dev/md definition weirdness and a (initramfs) prompt, I finally have 10.04 installed on RAID1 /dev/md partitions. Tried the repartition manually just before installing - no joy. I wiped the disks clean then used the installer and manually set the last partition to be ~1GB smaller than the partitioner was suggesting and _bang_ it worked flawlessly. Here are some details about the setup - you guessed it 2 x 500GB Seagate disks...:
 
lobo@test:~$ uname -a
Linux test 2.6.32-23-generic #37-Ubuntu SMP Fri Jun 11 07:54:58 UTC 2010 i686 GNU/Linux

/dev/sda:

Model=ST3500320AS, FwRev=SD15, SerialNo=9QM7HLTS
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

/dev/sdb:

Model=ST3500320AS, FwRev=SD15, SerialNo=9QM63DNQ
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000693db

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         132     1060258+  fd  Linux raid autodetect
/dev/sda2             133         394     2104515   fd  Linux raid autodetect
/dev/sda3             395       60575   483397632   fd  Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a1c27

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1         132     1060258+  fd  Linux raid autodetect
/dev/sdb2             133         394     2104515   fd  Linux raid autodetect
/dev/sdb3             395       60575   483397632   fd  Linux raid autodetect

lobo@test:~$ cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md2 : active raid1 sda3[0] sdb3[1]
      483397568 blocks [2/2] [UU]
      [==================>..]  resync = 94.6% (457681664/483397568) finish=10.1min speed=41989K/sec
      
md1 : active raid1 sda2[0] sdb2[1]
      2104448 blocks [2/2] [UU]
      
md0 : active raid1 sdb1[1] sda1[0]
      1060160 blocks [2/2] [UU]
      
unused devices: <none>

lobo@test:~$ mount | grep ext4
/dev/md2 on / type ext4 (rw,errors=remount-ro)
/dev/md0 on /boot type ext4 (rw)

lobo@hal:~$ swapon -s
Filename				Type		Size	Used	Priority
/dev/md1                                partition	2104440	0	-1

Thanks again folks - this was driving me nuts! Hope this helps narrow down the cause.

Steve Langasek (vorlon) on 2010-07-05

Changed in mdadm (Ubuntu Lucid):
milestone:	ubuntu-10.04 → none
Changed in mdadm (Ubuntu):
milestone:	ubuntu-10.04 → none

Revision history for this message

ceg (ceg) wrote on 2010-07-08:

#65

May this be related? (500GB disk): Bug #599515 mdadm misdetects disks instead of partition

Revision history for this message

Mattias Toom (matthew-toom) wrote on 2010-07-15:

#66

Hi all,

I can confirm this bug and was lucky enough to find the bug report for it. I installed Ubuntu Server 10.04 64-bit on a machine with multiple RAID1 configurations (mdadm) working and was attempting to build a RAID1 with 2x 500gb drives for the OS.

Couldn't figure out why a fresh install of the OS wouldn't work and was getting worried since I have mdadm running a 2x2tb, and a pair of 2x1tb RAID1's. I thought the error might be relating to all of those RAID configurations confusing the installer somehow. It wasn't.

I got the error messages mentioned above.

It looks like there was an error partitioning the drives; after carefully manually partitioning the drives (USING the installer, I suppose this could have been done with fdisk) I left 100 megabytes of free space as per the post of Thomas Krause. The other change I made was NO to "do you want the system to boot if one of the OS drives is degraded?". I think it was the partitioning change that fixed the problem.

Anyways, after the 2nd install Ubuntu Server boots fine, automatically recognizes all my arrays and life is beautiful.

M. Toom BSc(comp.sci)

Revision history for this message

Mattias Toom (matthew-toom) wrote on 2010-07-15:

#67

Further, to add some specificity, the free space I left was at the end of the drive. So, I did about 8 gb for swap, then the remaining size of the disk minus 0.1 gb for the bootable ext4 partition.

Thanks to all for creating this thread (and contributing to it) to document the bug, it was very useful for me and I was able to fix the issue fairly quickly after finding it.

Revision history for this message

Unlogic (unlogic-unlogic) wrote on 2010-08-15:

#68

I can also confirm this bug. I recently installed two servers both with two 500GB drives configured using raid 1 with Linux software raid.

The installation goes smooth but when I reboot I end up with a busybox prompt and a nonworking raid setup.

If I partition the drives and create the raid using a Mandriva 2010.1 disc and then install Ubuntu 10.04 on top pf those partitions it works just fine.

I'm very surprised to see that the mighty Ubuntu distro is having such serious bugs, I hope this gets sorted out quickly!

Btw. Here is a whole thread at the forums discussing this issue: http://ubuntuforums.org/showthread.php?t=1474950

Revision history for this message

Plutocrat (plutocrat) wrote on 2010-08-21:

#69

Confirmed here as well. I couldn't believe that such a fundamental bug would be released, but really, yes, my RAID system is unbootable after using the installer. Two and a half days wasted.
I've got an IBM 3200 server with four 500Gb disks. My partitioning scheme is as follows.
On each disk:
/boot 1Gb
/ 15Gb
swap 2Gb
/home 482Gb
Then I have a
md0 RAID 1 on the four boot partitions (sd[abcd]1),
md1 RAID 5 across the / partitions (sd[abcd]2) ,
md2 RAID5 across the /home partittions (sd[abcd]4)

The /etc/mdadm/mdadm.conf file looks correct before I reboot.

After install I get the initramfs prompt.

doing cat /proc/mdstat tells me only the md2 array is detected and its rebuilding.
mdadm --detail /dev/md2 tells me that it is compose of sda, sdb, sdc, sdd ie the WHOLE disks

I can do mdadm --stop /dev/md2
and then.
mdadm --assemble /dev/md0 /dev/sd[abcd]1
mdadm --assemble /dev/md1 /dev/sd[abcd]2
mdadm --assemble /dev/md2 /dev/sd[abcd]4

After this the md2 only adds 3 of the 4 disks, but I can now boot by typing exit to get out of the initramfs.

I've tried all my tricks to get this config to stick but apparently every reboot I have to manually stop the wrong array and manually assemble them all again.

This is the amd/64 iso for ubuntu server 10.04. I also had the same problem with the 386 version.

I've tried vaping the partition tables, reformatting the drives, and zeroing the superblock and none of these fix it. Two and a half days. Is the bug in mdadm, the partitioner, or what?

Revision history for this message

Plutocrat (plutocrat) wrote on 2010-08-21:

#70

PS if anyone can suggest ways I can get the manually assembled arrays to 'stick' I'd be grateful. I'm not sure I could go through another install.

Revision history for this message

Guido Scalise (guido-scalise) wrote on 2010-08-22:

#71

After an entire day lost trying to install 10.04.1 LTS on a brand new Dell PowerEdge R210 with two 500Gb drives, I found this bug report and followed Thomas Krause workaround (creating the last partition with 100Mb less, thus leaving 100Mb unused) and was finally able to boot.

I can't believe such a gross bug went released. An entire day of work lost.

It should also be noted that during installation, grub offers to automatically install on one of the disks' MBR (/dev/sda in my case), when the correct thing to do would be to install it on both /dev/sda and /dev/sdb

Revision history for this message

Ky Weichel (kweichel) wrote on 2010-08-23:

#72

The bug is in the installer's partitioner, NOT in mdadm. As previously stated, the partitions are being created by the installer with an end cylinder number that is one greater than the actual end of the disk.

All you have to do is create your _partitions_ somewhere other than the Ubuntu Server Installer. You can use a GParted disc, boot to your favourite Live CD and fdisk them, or whatever is easiest for you. Just make sure you set their type to Linux Raid Autodetect (that's type "FD" in fdisk).

That way you can even make them take up your whole disk and you don't have to waste 100MB of space.

You can then fire up the Ubuntu Server Installer and create your RAID sets on the partitions you made elsewhere.

Revision history for this message

Unlogic (unlogic-unlogic) wrote on 2010-08-23:

#73

I created a script that wipes my disks of any superblocks and partitions and then recreates the partitions again (using a stored partition table with sfdisk) and the raid devices using mdadm.

If I start the Ubuntu installer, switch to another console and run the script and then proceed to install Ubuntu on the created partitions this bug occurs and the system gets stuck on first boot.

If I boot up another Linux install disc, run the script and then start the Ubuntu installer again and install Ubuntu on the created partitions all works fine.

So I'm not entirely sure that the bug is in the installer's partitioner. I my case above I didn't touch the partitioner and used the following partition table with sfdisk instead:

# partition table of /dev/sda
unit: sectors

/dev/sda1 : start= 2048, size= 19529728, Id=fd, bootable
/dev/sda2 : start= 19531776, size=947265536, Id=fd
/dev/sda3 : start=966797312, size= 9975808, Id=fd
/dev/sda4 : start= 0, size= 0, Id= 0

Revision history for this message

Plutocrat (plutocrat) wrote on 2010-08-24:

#74

@Ky Thanks for the feedback. I was a little jaded when I wrote my post. I bit the bullet on Monday morning and went through the install again. First of all I booted into a Gparted Live CD and wiped all RAID Arrays and partitions. I rebooted into it again to check, and had to remove a new RAID array md127? which had been created. After that the disks were clear.

I then ran the installer, creating my partitions , but leaving a space at the end of the disk, after the last partition. In my case this was /dev/sd[abcd]4, and I left 1Gb (although I gather less will also work). The RAID arrays assembled OK and I could reboot after install. So, just confirming the workaround ...

Revision history for this message

Plutocrat (plutocrat) wrote on 2010-08-24:

#75

@Guido - I think grub will offer to install on all partitions marked as 'boot'. In my case this was all four /dev/sd[abcd]1 partitions, so it offered to install on all of them.

Revision history for this message

Keith Cornwell (apex-thing2) wrote on 2010-08-24:

#76

I opened bug #605720 for what I think is the root cause of this problem. I don't know how the installer does partitioning but it exhibits the same behavior.
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/605720

Revision history for this message

Doug Jones (djsdl) wrote on 2010-09-03:

#77

This bit me too. Two 500GB drives, RAID1, using 10.04.1 alternate 386 installer.

Reading through all these comments, and those on similar (possibly related) bugs, it seems like this is caused by an arithmetic error in some code that figures out where things ought to be on the disk.

Suddenly I am reminded of another arithmetic error that cropped up in gparted recently, relating to the switchover from align-to-cylinder to align-to-megabyte. Didn't the default partition alignment method just change in Lucid?

Very suspicious...

Thierry Carrez (ttx) on 2010-09-03

tags:

added: server-mro

Revision history for this message

Jean-Luc Boss (jlb74) wrote on 2010-09-03:

#78

Had the same error with a 64 bit install on a dell optiplex 780 with 2 500Gb disks...
the rounding up the ".1" workaround worked for me

Revision history for this message

D_A_N_K_O (russian-robotic) wrote on 2010-09-07:

#79

A have the same problem with 2x500Gb Seagate HDD . I can`t create raid 1
Ubuntu Server 10.04 with all updates

/dev/sda:

ATA device, with non-removable media
Model Number: GB0500EAFJH
Serial Number: 9QM8L22V
Firmware Revision: HPG6
Standards:
Used: ATA/ATAPI-7 T13 1532D revision 4a
Supported: 7 6 5 4 & some of 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 976773168
Logical Sector size: 512 bytes
Physical Sector size: 512 bytes
device size with M = 1024*1024: 476940 MBytes
device size with M = 1000*1000: 500107 MBytes (500 GB)
cache/buffer size = unknown
Nominal Media Rotation Rate: 7200
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = ?

/dev/sda:

ATA device, with non-removable media
Model Number: GB0500EAFJH
Serial Number: 9QM8L22V
Firmware Revision: HPG6
Standards:
Used: ATA/ATAPI-7 T13 1532D revision 4a
Supported: 7 6 5 4 & some of 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 976773168
Logical Sector size: 512 bytes
Physical Sector size: 512 bytes
device size with M = 1024*1024: 476940 MBytes
device size with M = 1000*1000: 500107 MBytes (500 GB)
cache/buffer size = unknown
Nominal Media Rotation Rate: 7200
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = ?

A have  the same problem with   2x500Gb  Seagate  HDD .   I can`t create   raid 1      
Ubuntu Server  10.04  with all updates

/dev/sda:

ATA device, with non-removable media
	Model Number:       GB0500EAFJH                             
	Serial Number:      9QM8L22V            
	Firmware Revision:  HPG6    
Standards:
	Used: ATA/ATAPI-7 T13 1532D revision 4a 
	Supported: 7 6 5 4 & some of 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors:  976773168
	Logical  Sector size:                   512 bytes
	Physical Sector size:                   512 bytes
	device size with M = 1024*1024:      476940 MBytes
	device size with M = 1000*1000:      500107 MBytes (500 GB)
	cache/buffer size  = unknown
	Nominal Media Rotation Rate: 7200
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 16	Current = ?

/dev/sda:

ATA device, with non-removable media
	Model Number:       GB0500EAFJH                             
	Serial Number:      9QM8L22V            
	Firmware Revision:  HPG6    
Standards:
	Used: ATA/ATAPI-7 T13 1532D revision 4a 
	Supported: 7 6 5 4 & some of 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors:  976773168
	Logical  Sector size:                   512 bytes
	Physical Sector size:                   512 bytes
	device size with M = 1024*1024:      476940 MBytes
	device size with M = 1000*1000:      500107 MBytes (500 GB)
	cache/buffer size  = unknown
	Nominal Media Rotation Rate: 7200
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 16	Current = ?

Revision history for this message

Alexandr (olexandr-dmitriev) wrote on 2010-09-10:

#80

The sames issue with me:10.04.1 - 2 500GB WD disks, RAID1 and Busybox after install. Manual partitioning reducing the last one by 100 MB helped... Very sad.

Revision history for this message

Luigi Messina (grimmo) wrote on 2010-09-19:

#81

Download full text (4.2 KiB)

I'm having very similar simptoms after installing 10.04.1 from scratch
with two 500Gb disks (WD and ST).
The system installs and boots correctly if the raid1 array is created manually
from CLI before partitions detection.
But after some hours of uptime, errors start appearing in logs and the array becomes degraded:

Sep 19 13:36:19 deepthought kernel: [ 278.248022] ata3.00: qc timeout (cmd 0x27)
Sep 19 13:36:19 deepthought kernel: [ 278.248027] ata3.00: failed to read native max address (err_mask=0x4)
Sep 19 13:36:19 deepthought kernel: [ 278.248033] ata3.00: disabled
Sep 19 13:36:19 deepthought kernel: [ 278.248039] ata3.00: device reported invalid CHS sector 0
Sep 19 13:36:19 deepthought kernel: [ 278.248049] ata3: hard resetting link
Sep 19 13:36:20 deepthought kernel: [ 279.128035] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep 19 13:36:20 deepthought kernel: [ 279.128048] ata3: EH complete
Sep 19 13:36:20 deepthought kernel: [ 279.128057] sd 2:0:0:0: [sdb] Unhandled error code
Sep 19 13:36:20 deepthought kernel: [ 279.128059] sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 19 13:36:20 deepthought kernel: [ 279.128062] sd 2:0:0:0: [sdb] CDB: Write(10): 2a 00 3a 38 5f 88 00 00 08 00
Sep 19 13:36:20 deepthought kernel: [ 279.128082] md: super_written gets error=-5, uptodate=0
Sep 19 13:36:20 deepthought kernel: [ 279.128105] sd 2:0:0:0: [sdb] Unhandled error code
Sep 19 13:36:20 deepthought kernel: [ 279.128106] sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 19 13:36:20 deepthought kernel: [ 279.128109] sd 2:0:0:0: [sdb] CDB: Read(10): 28 00 06 a2 3c 80 00 00 20 00
Sep 19 13:36:20 deepthought kernel: [ 279.205366] RAID1 conf printout:
Sep 19 13:36:20 deepthought kernel: [ 279.205369] --- wd:1 rd:2
Sep 19 13:36:20 deepthought kernel: [ 279.205371] disk 0, wo:0, o:1, dev:sda
Sep 19 13:36:20 deepthought kernel: [ 279.205373] disk 1, wo:1, o:0, dev:sdb
Sep 19 13:36:20 deepthought kernel: [ 279.212009] RAID1 conf printout:
Sep 19 13:36:20 deepthought kernel: [ 279.212011] --- wd:1 rd:2
Sep 19 13:36:20 deepthought kernel: [ 279.212013] disk 0, wo:0, o:1, dev:sda

also in dmesg this message is present at every boot:

[ 3.022033] md1: p5 size 976269312 exceeds device capacity, limited to end of disk

These are the partitions as seen from sfdisk:

~$ sudo sfdisk -l /dev/sda

Disk /dev/sda: 30401 cylinders, 255 heads, 63 sectors/track
Warning: The partition table looks like it was made
for C/H/S=*/81/63 (instead of 30401/255/63).
For this listing I'll assume that geometry.
Units = cylinders of 2612736 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sda1 0+ 95707- 95708- 244197560 83 Linux
end: (c,h,s) expected (1023,80,63) found (705,80,63)
/dev/sda2 0 - 0 0 0 Empty
/dev/sda3 0 - 0 0 0 Empty
/dev/sda4 0 - 0 0 0 Empty

~$ sudo sfdisk -l /dev/sdb

Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS a...

I'm having very similar simptoms after installing 10.04.1 from scratch
with two 500Gb disks (WD and ST).
The system installs and boots correctly if the raid1 array is created manually
from CLI before partitions detection.
But after some hours of uptime, errors start appearing in logs and the array becomes degraded:

Sep 19 13:36:19 deepthought kernel: [  278.248022] ata3.00: qc timeout (cmd 0x27)
Sep 19 13:36:19 deepthought kernel: [  278.248027] ata3.00: failed to read native max address (err_mask=0x4)
Sep 19 13:36:19 deepthought kernel: [  278.248033] ata3.00: disabled
Sep 19 13:36:19 deepthought kernel: [  278.248039] ata3.00: device reported invalid CHS sector 0
Sep 19 13:36:19 deepthought kernel: [  278.248049] ata3: hard resetting link
Sep 19 13:36:20 deepthought kernel: [  279.128035] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep 19 13:36:20 deepthought kernel: [  279.128048] ata3: EH complete
Sep 19 13:36:20 deepthought kernel: [  279.128057] sd 2:0:0:0: [sdb] Unhandled error code
Sep 19 13:36:20 deepthought kernel: [  279.128059] sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 19 13:36:20 deepthought kernel: [  279.128062] sd 2:0:0:0: [sdb] CDB: Write(10): 2a 00 3a 38 5f 88 00 00 08 00
Sep 19 13:36:20 deepthought kernel: [  279.128082] md: super_written gets error=-5, uptodate=0
Sep 19 13:36:20 deepthought kernel: [  279.128105] sd 2:0:0:0: [sdb] Unhandled error code
Sep 19 13:36:20 deepthought kernel: [  279.128106] sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 19 13:36:20 deepthought kernel: [  279.128109] sd 2:0:0:0: [sdb] CDB: Read(10): 28 00 06 a2 3c 80 00 00 20 00
Sep 19 13:36:20 deepthought kernel: [  279.205366] RAID1 conf printout:
Sep 19 13:36:20 deepthought kernel: [  279.205369]  --- wd:1 rd:2
Sep 19 13:36:20 deepthought kernel: [  279.205371]  disk 0, wo:0, o:1, dev:sda
Sep 19 13:36:20 deepthought kernel: [  279.205373]  disk 1, wo:1, o:0, dev:sdb
Sep 19 13:36:20 deepthought kernel: [  279.212009] RAID1 conf printout:
Sep 19 13:36:20 deepthought kernel: [  279.212011]  --- wd:1 rd:2
Sep 19 13:36:20 deepthought kernel: [  279.212013]  disk 0, wo:0, o:1, dev:sda

also in dmesg this message is present at every boot:

[    3.022033] md1: p5 size 976269312 exceeds device capacity, limited to end of disk

These are the partitions as seen from sfdisk:

~$ sudo sfdisk -l /dev/sda

Disk /dev/sda: 30401 cylinders, 255 heads, 63 sectors/track
Warning: The partition table looks like it was made
  for C/H/S=*/81/63 (instead of 30401/255/63).
For this listing I'll assume that geometry.
Units = cylinders of 2612736 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sda1          0+  95707-  95708- 244197560   83  Linux
		end: (c,h,s) expected (1023,80,63) found (705,80,63)
/dev/sda2          0       -       0          0    0  Empty
/dev/sda3          0       -       0          0    0  Empty
/dev/sda4          0       -       0          0    0  Empty

~$ sudo sfdisk -l /dev/sdb

Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sdb1          0+     31-     32-    249856   fd  Linux raid autodetect
/dev/sdb2         31+  60801-  60770- 488134657    5  Extended
/dev/sdb3          0       -       0          0    0  Empty
/dev/sdb4          0       -       0          0    0  Empty
/dev/sdb5         31+  60801-  60770- 488134656   fd  Linux raid autodetect

This is /proc/mdstat output instead:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sdc1[1] md1p1[0]
      249792 blocks [2/2] [UU]
      
md1 : active raid1 sdb[0]
      488134592 blocks [2/1] [U_]
      bitmap: 114/233 pages [456KB], 1024KB chunk

I'm at the third/fourth reinstall attempt (previously I've had the bad idea of using raid1+luks+lvm now I've
switchet to plaintext) but I'm still having stability issues.
Can somebody confirm whether I'm hitting this bug?

Revision history for this message

Luigi Messina (grimmo) wrote on 2010-09-19:

#82

pardon, my system has decided to switch device names when the array became degraded, the correct fdisk output for the other disk is this one:

Disk /dev/sdc: 60801 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sdc1 * 0+ 31- 32- 249856 fd Linux raid autodetect
/dev/sdc2 31+ 60801- 60770- 488134657 5 Extended
/dev/sdc3 0 - 0 0 0 Empty
/dev/sdc4 0 - 0 0 0 Empty
/dev/sdc5 31+ 60801- 60770- 488134656 fd Linux raid autodetect

Revision history for this message

Colin Watson (cjwatson) wrote on 2010-09-28:

#83

I finally got back to this bug and figured out what was going on. It took a while ...

A few people suggested that what was happening was that the partitioner was creating partitions that extended beyond the end of the disk. That wasn't actually quite right if you looked at the logs in detail and did the arithmetic; they were entirely within the disk, just extending onto the last (incomplete) cylinder, and there's nothing wrong with that in itself. However, there were log messages indicating that the md layer in the kernel thought that an md device was overflowing the disk, and this pointed me in the right direction.

When I tried to fix this bug before, I observed that what was happening was that mdadm was getting confused between /dev/sda and /dev/sda1 (or whatever the last partition happened to be). Since the 0.90 metadata format stores the superblock at the end of the device, there's obvious potential for confusion between a partition extending all the way to the end of the disk and the disk device itself. I fixed this, or so I thought, by constraining the installer's partitioner to never use the last sector of the disk. This fixed the problem in my tests.

Unfortunately, I apparently didn't quite do enough research on exactly what was happening. When I came back to this bug, I read the md(4) manual page, and found this:

  The common format - known as version 0.90 - has a superblock that is 4K long
  and is written into a 64K aligned block that starts at least 64K and less
  than 128K from the end of the device (i.e. to get the address of the superblock round the size of the device down to a multiple of 64K and then subtract 64K).

(The 1.0 superblock format is similar, but is never more than 12K from the end of the device, so a fix for 0.90 will fix 1.0 too. 1.1 and 1.2 store their superblocks at or near the start of the device, and do not suffer from this problem.)

So, if you do the mathematics based on partman's current constraints, the result is that Ubuntu will currently get this wrong for any disk whose size is an exact multiple of 1048576 bytes plus any number between 512 and 65536. The 500GB disks common among commenters on this bug report are, according to the logs, 500107862016 bytes long, which is 476940 * 1048576 + 24576. I could never reproduce this in KVM before because my habit is to create disk images which are an exact number of megabytes (I usually just say '10G' or thereabouts), and such an image would never encounter this bug thanks to my previous attempted fix of avoiding the last sector.

The proper fix, then, is for partman to round the disk size down to 64K, subtract one further sector, and avoid any sectors after that.

I finally got back to this bug and figured out what was going on.  It took a while ...

A few people suggested that what was happening was that the partitioner was creating partitions that extended beyond the end of the disk.  That wasn't actually quite right if you looked at the logs in detail and did the arithmetic; they were entirely within the disk, just extending onto the last (incomplete) cylinder, and there's nothing wrong with that in itself.  However, there were log messages indicating that the md layer in the kernel thought that an md device was overflowing the disk, and this pointed me in the right direction.

When I tried to fix this bug before, I observed that what was happening was that mdadm was getting confused between /dev/sda and /dev/sda1 (or whatever the last partition happened to be).  Since the 0.90 metadata format stores the superblock at the end of the device, there's obvious potential for confusion between a partition extending all the way to the end of the disk and the disk device itself.  I fixed this, or so I thought, by constraining the installer's partitioner to never use the last sector of the disk.  This fixed the problem in my tests.

Unfortunately, I apparently didn't quite do enough research on exactly what was happening.  When I came back to this bug, I read the md(4) manual page, and found this:

The  common format - known as version 0.90 - has a superblock that is 4K long
  and is written into a 64K aligned block that starts at least 64K and less
  than 128K from the end of the device (i.e. to get the address of the superblock round the size of the device down to a multiple of 64K and then subtract 64K).

(The 1.0 superblock format is similar, but is never more than 12K from the end of the device, so a fix for 0.90 will fix 1.0 too.  1.1 and 1.2 store their superblocks at or near the start of the device, and do not suffer from this problem.)

So, if you do the mathematics based on partman's current constraints, the result is that Ubuntu will currently get this wrong for any disk whose size is an exact multiple of 1048576 bytes plus any number between 512 and 65536.  The 500GB disks common among commenters on this bug report are, according to the logs, 500107862016 bytes long, which is 476940 * 1048576 + 24576.  I could never reproduce this in KVM before because my habit is to create disk images which are an exact number of megabytes (I usually just say '10G' or thereabouts), and such an image would never encounter this bug thanks to my previous attempted fix of avoiding the last sector.

The proper fix, then, is for partman to round the disk size down to 64K, subtract one further sector, and avoid any sectors after that.

Colin Watson (cjwatson) on 2010-09-28

Changed in grub2 (Ubuntu):
status:	New → Invalid
Changed in grub2 (Ubuntu Lucid):
status:	New → Invalid
affects:	mdadm (Ubuntu) → partman-base (Ubuntu)
Changed in partman-base (Ubuntu):
status:	Confirmed → In Progress

Revision history for this message

Colin Watson (cjwatson) wrote on 2010-09-28:

#84

I'll upload a fix for 10.04.2 once we've tested this for 10.10.

summary:	- mount: mounting /dev/md0 on /root/ failed: Invalid argument + partman sometimes creates partitions such that there is ambiguity + between whether the superblock is on the disk device or the partition + device
description:	updated
Changed in partman-base (Ubuntu):
status:	In Progress → Fix Committed
Changed in partman-base (Ubuntu Lucid):
status:	Confirmed → Triaged
status:	Triaged → Fix Committed

Colin Watson (cjwatson) on 2010-09-28

Changed in partman-base (Ubuntu Maverick):
milestone:	none → ubuntu-10.10
Changed in partman-base (Ubuntu Lucid):
milestone:	none → ubuntu-10.04.2

Revision history for this message

Jonathan Vaslin (jonathan-vaslin) wrote on 2010-09-30:

#85

I struggled a lot, glad to read Colin Watson post. I have a Dell R210, with a S100 software raid controller and 2 500 GB hard disks. I thought the problem was coming from the S100 card all along, finally read all the explanations. Thanks a lot M. Watson !

Revision history for this message

Launchpad Janitor (janitor) wrote on 2010-10-01:

#86

This bug was fixed in the package partman-base - 141ubuntu2

---------------
partman-base (141ubuntu2) maverick; urgency=low

  * Expand the small gap we leave at the end of the disk to avoid MD
    superblock ambiguity so that it correctly covers the region where
    ambiguity might arise. The previous gap was insufficient on disks that
    were between 512 and 65535 bytes larger than a multiple of 1048576 bytes
    (LP: #569900).
-- Colin Watson <email address hidden> Tue, 28 Sep 2010 21:17:07 +0100

Changed in partman-base (Ubuntu Maverick):
status:	Fix Committed → Fix Released

Revision history for this message

Thomas (teclab-at) wrote on 2010-10-16:

#87

I would like to have you a look at this one:
https://bugs.launchpad.net/unetbootin/+bug/661820

Revision history for this message

Curtis Dutton (curtdutt) wrote on 2010-10-27:

#88

I was able to fix this problem after the failed install and avoid the re-install. Ubuntu Server 10.10

I also have the 500GB drives with a RAID1.

My partitions were done as default with /dev/sda1 and /dev/sdb1 as the root partitions and /dev/sda5 and /dev/sdb5 as the swap partitions.

I shrunk the swap partitions (/dev/sda5 and /dev/sdb5) by 2 "Units" using fdisk and then zeroed the raid superblocks.

The steps I took.

Load up the rescue disk and mount the installer root. In my case it was seeing md5 and md5p1 as my raid devices.

1. run "mdadm --stop /dev/md5"
2. fdisk /dev/sda
3. delete partition 5 (/dev/sd5)
4. create a new logical partition 5
5. use the default start
6. used End - 2 (in my case 60802 - 2 = 60800) as the End
7. change type to "fd" for "Linux raid autodetect"
8-14. reapeat steps 2 - 7 with /dev/sdb
15. run mdadm --zero-superblock /dev/sda
16. run mdadm --zero-superblock /dev/sdb
17. reboot

After getting this to work, and realizing that a block is about 8MB, I realize this solution loses out on 16MB on my disks. I'm sure a larger end value might work as well but who cares. It boots now.

Revision history for this message

scrapper (microcontrollerfreak) wrote on 2010-11-27:

#89

similar bug here: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/612224

well i have to say i have 500GB HDD as well, but i think thats just by accident.

as you can read here:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/612224

i solved the problem of not beeing able to boot, by not putting SWAP into RAID 1.

greetings
scrapper

Colin Watson (cjwatson) on 2010-12-06

Changed in partman-base (Ubuntu Lucid):
status:	Fix Committed → In Progress
description:	updated

Revision history for this message

Martin Pitt (pitti) wrote on 2010-12-10: Please test proposed package

#90

Accepted partman-base into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in partman-base (Ubuntu Lucid):
status:	In Progress → Fix Committed
tags:	added: verification-needed

Revision history for this message

Martin Pitt (pitti) wrote on 2011-01-25:

#91

Any testers of the lucid-proposed package? Please note that we have daily built lucid CDs which include -proposed, which allow you to test this. Thanks!

Revision history for this message

Imre Gergely (cemc) wrote on 2011-01-25:

#92

Did the test on Lucid like this:

- created two identical qcow2 images:
kvm-img create -f qcow2 disk1.qcow2 1048576512
kvm-img create -f qcow2 disk2.qcow2 1048576512

disk image sizes 1048576512 bytes = 1048576*1000+512 (formula from the test case)

- first installed Lucid server i386 to reproduce the problem:
kvm -m 512 -cdrom /store/Kits/isos/lucid/ubuntu-10.04.1-server-i386.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
- created RAID1 on sda1 and sdb1 (two partitions which stretch to the end of each disk)
- after reboot got the (initramfs) prompt and the errors

- recreated the images, reinstalled the latest Lucid:
kvm -m 512 -cdrom /store/Kits/isos/lucid/mini.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
- mini.iso taken from http://archive.ubuntu.com/ubuntu/dists/lucid/main/installer-i386/current/images/netboot/mini.iso , and booted with "cli apt-setup/proposed=true" parameters
- after install the guest booted just fine, RAID members were sync'ed, no problems.

Looks like the fix is working.

Revision history for this message

Orticio Jlgtgutisu (jlgutisu3) wrote on 2011-01-25: RE: [Bug 569900] Re: partman sometimes creates partitions such that there is ambiguity between whether the superblock is on the disk device or the partition device

#93

Download full text (4.0 KiB)

Only lenguage spanish

> Date: Tue, 25 Jan 2011 21:06:44 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 569900] Re: partman sometimes creates partitions such that there is ambiguity between whether the superblock is on the disk device or the partition device
>
> Did the test on Lucid like this:
>
> - created two identical qcow2 images:
> kvm-img create -f qcow2 disk1.qcow2 1048576512
> kvm-img create -f qcow2 disk2.qcow2 1048576512
>
> disk image sizes 1048576512 bytes = 1048576*1000+512 (formula from the
> test case)
>
> - first installed Lucid server i386 to reproduce the problem:
> kvm -m 512 -cdrom /store/Kits/isos/lucid/ubuntu-10.04.1-server-i386.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
> - created RAID1 on sda1 and sdb1 (two partitions which stretch to the end of each disk)
> - after reboot got the (initramfs) prompt and the errors
>
> - recreated the images, reinstalled the latest Lucid:
> kvm -m 512 -cdrom /store/Kits/isos/lucid/mini.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
> - mini.iso taken from http://archive.ubuntu.com/ubuntu/dists/lucid/main/installer-i386/current/images/netboot/mini.iso , and booted with "cli apt-setup/proposed=true" parameters
> - after install the guest booted just fine, RAID members were sync'ed, no problems.
>
> Looks like the fix is working.
>
> --
> You received this bug notification because you are subscribed to Ubuntu
> ubuntu-10.04.2.
> https://bugs.launchpad.net/bugs/569900
>
> Title:
> partman sometimes creates partitions such that there is ambiguity
> between whether the superblock is on the disk device or the partition
> device
>
> Status in “grub2” package in Ubuntu:
> Invalid
> Status in “partman-base” package in Ubuntu:
> Fix Released
> Status in “grub2” source package in Lucid:
> Invalid
> Status in “partman-base” source package in Lucid:
> Fix Committed
> Status in “grub2” source package in Maverick:
> Invalid
> Status in “partman-base” source package in Maverick:
> Fix Released
>
> Bug description:
> Binary package hint: mdadm
>
> In a KVM, I can do this just fine:
>
> * Using 2 virtual disk images
> * Install Lucid Server amd64
> * Both disks partitioned to just one large Linux raid partition
> * RAID1 these two together, /dev/md0
> * Put / on an ext4 filesystem on /dev/md0
> * Install
>
> The above works.
>
> However, I have spent my entire weekend trying to get 10.04 on a RAID1
> of two 500GB SATA disks, without success.
>
> I partitioned them the same as above. And conducted the install.
>
> When I boot into the new system, I get dropped to an initramfs shell.
>
> I can see that /dev/md0 exists, and is in the process of resyncing.
>
> I try to "mount /dev/md0 /root" and I get:
> mount: mounting /dev/md0 on /root/ failed: Invalid argument
>
> Also, see something else that's odd... My /dev/md0 looks "correct",
> in that it's composed of /dev/sda1 and /dev/sdb1. However, I also see
> a /dev/md0p1, which is composed of /dev/sda and /dev/sdb (the whole
> disks?). Furthermore, if I go into /dev/disk/b...

Only lenguage spanish

> Date: Tue, 25 Jan 2011 21:06:44 +0000
> From: 569900@bugs.launchpad.net
> To: jlgutisu3@hotmail.com
> Subject: [Bug 569900] Re: partman sometimes creates partitions such that there	is ambiguity between whether the superblock is on the disk	device or the partition device
> 
> Did the test on Lucid like this:
> 
> - created two identical qcow2 images:
>     kvm-img create -f qcow2 disk1.qcow2 1048576512
>     kvm-img create -f qcow2 disk2.qcow2 1048576512
> 
> disk image sizes 1048576512 bytes = 1048576*1000+512 (formula from the
> test case)
> 
> - first installed Lucid server i386 to reproduce the problem:
>     kvm -m 512 -cdrom /store/Kits/isos/lucid/ubuntu-10.04.1-server-i386.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
> - created RAID1 on sda1 and sdb1 (two partitions which stretch to the end of each disk)
> - after reboot got the (initramfs) prompt and the errors
> 
> - recreated the images, reinstalled the latest Lucid:
>     kvm -m 512 -cdrom /store/Kits/isos/lucid/mini.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
> - mini.iso taken from http://archive.ubuntu.com/ubuntu/dists/lucid/main/installer-i386/current/images/netboot/mini.iso , and booted with "cli apt-setup/proposed=true" parameters
> - after install the guest booted just fine, RAID members were sync'ed, no problems.
> 
> Looks like the fix is working.
> 
> -- 
> You received this bug notification because you are subscribed to Ubuntu
> ubuntu-10.04.2.
> https://bugs.launchpad.net/bugs/569900
> 
> Title:
>   partman sometimes creates partitions such that there is ambiguity
>   between whether the superblock is on the disk device or the partition
>   device
> 
> Status in “grub2” package in Ubuntu:
>   Invalid
> Status in “partman-base” package in Ubuntu:
>   Fix Released
> Status in “grub2” source package in Lucid:
>   Invalid
> Status in “partman-base” source package in Lucid:
>   Fix Committed
> Status in “grub2” source package in Maverick:
>   Invalid
> Status in “partman-base” source package in Maverick:
>   Fix Released
> 
> Bug description:
>   Binary package hint: mdadm
> 
>   In a KVM, I can do this just fine:
> 
>    * Using 2 virtual disk images
>    * Install Lucid Server amd64
>    * Both disks partitioned to just one large Linux raid partition
>    * RAID1 these two together, /dev/md0
>    * Put / on an ext4 filesystem on /dev/md0
>    * Install
> 
>   The above works.
> 
>   However, I have spent my entire weekend trying to get 10.04 on a RAID1
>   of two 500GB SATA disks, without success.
> 
>   I partitioned them the same as above.  And conducted the install.
> 
>   When I boot into the new system, I get dropped to an initramfs shell.
> 
>   I can see that /dev/md0 exists, and is in the process of resyncing.
> 
>   I try to "mount /dev/md0 /root" and I get:
>   mount: mounting /dev/md0 on /root/ failed: Invalid argument
> 
>   Also, see something else that's odd...  My /dev/md0 looks "correct",
>   in that it's composed of /dev/sda1 and /dev/sdb1.  However, I also see
>   a /dev/md0p1, which is composed of /dev/sda and /dev/sdb (the whole
>   disks?).  Furthermore, if I go into /dev/disk/by-uuid, there is only
>   one symlink there, pointing to /dev/md0p1.  And this UUID is what is
>   in fact in grub as the root device.  That looks quite wrong.
> 
>   This looks pretty release-critical, to me, as it's affecting RAID
>   installs of the server.
> 
>   TEST CASE: The above problem should arise when attempting a RAID
>   install on any disk whose size is between 1048576*n+512 and
>   1048576*n+65535 bytes, for integer values of n.  In order to reproduce
>   this, the root filesystem should be created on a RAID array whose
>   member devices extend all the way to the end of the disk (i.e. accept
>   the default size for the partition in the installer).
> 
>   To validate this from -proposed (once available), please note that you
>   will need to use a netboot installation image and boot with apt-
>   setup/proposed=true on the kernel command line.
> 
>

Martin Pitt (pitti) on 2011-01-26

tags:

added: verification-done
removed: verification-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-01-26:

#94

This bug was fixed in the package partman-base - 139ubuntu7

---------------
partman-base (139ubuntu7) lucid-proposed; urgency=low

  * Expand the small gap we leave at the end of the disk to avoid MD
    superblock ambiguity so that it correctly covers the region where
    ambiguity might arise. The previous gap was insufficient on disks that
    were between 512 and 65535 bytes larger than a multiple of 1048576 bytes
    (LP: #569900).
-- Colin Watson <email address hidden> Mon, 06 Dec 2010 16:12:28 +0000

Changed in partman-base (Ubuntu Lucid):
status:	Fix Committed → Fix Released

Revision history for this message

Christian Brandt (brandtc) wrote on 2011-01-26:

#95

There are other way to get this bug too and it will haunt us back in the future and not only within Ubuntu

It hit me when partitionising with fdisk -u /dev/sdx but not with fdisk /dev/sdx because the first fits tightly to the end of the drive, the second leaves some space. The same happened when using GPT through cfdisk (not sure, its been a while).

All in all it shouldn't be wrong to use a tightly fitting partition table. The real problem here is that a 0.9 superblock can not be reliably source to either a device or a partition (or a LVM, actually I can imagine scenarios where its not only about drive or partition but in addition about other mappings like LVM, crypt, hey, even a stupidily placed part of a filesystem could qualify as a superblock). Until now it worked because scanning for superblock accidentially used a less error prone sequence for scanning (in fact even then the scanning usually ran head first into a wall but accidentially this didn't get through to the user)

In short: Placing vital information at the end of a bunch of sectors and hoping for a successful poking-around by the startup is stupid and prone for error.

Everyone should use front-aligned superblocks, that is version 1.1 and/or 1.2 because every known mapping (LVM, MD, Crypt, filesystems) are able to preserve lead-in-gaps and deliver this vital information to the next layer. Not so for for lead-out-gaps.

Revision history for this message

Orticio Jlgtgutisu (jlgutisu3) wrote on 2011-01-26:

#96

Download full text (4.3 KiB)

Por favor, lenguaje en Español

> Date: Wed, 26 Jan 2011 17:24:36 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 569900] Re: partman sometimes creates partitions such that there is ambiguity between whether the superblock is on the disk device or the partition device
>
> There are other way to get this bug too and it will haunt us back in the
> future and not only within Ubuntu
>
> It hit me when partitionising with fdisk -u /dev/sdx but not with fdisk
> /dev/sdx because the first fits tightly to the end of the drive, the
> second leaves some space. The same happened when using GPT through
> cfdisk (not sure, its been a while).
>
> All in all it shouldn't be wrong to use a tightly fitting partition
> table. The real problem here is that a 0.9 superblock can not be
> reliably source to either a device or a partition (or a LVM, actually I
> can imagine scenarios where its not only about drive or partition but in
> addition about other mappings like LVM, crypt, hey, even a stupidily
> placed part of a filesystem could qualify as a superblock). Until now it
> worked because scanning for superblock accidentially used a less error
> prone sequence for scanning (in fact even then the scanning usually ran
> head first into a wall but accidentially this didn't get through to the
> user)
>
> In short: Placing vital information at the end of a bunch of sectors and
> hoping for a successful poking-around by the startup is stupid and prone
> for error.
>
> Everyone should use front-aligned superblocks, that is version 1.1
> and/or 1.2 because every known mapping (LVM, MD, Crypt, filesystems) are
> able to preserve lead-in-gaps and deliver this vital information to the
> next layer. Not so for for lead-out-gaps.
>
> --
> You received this bug notification because you are subscribed to Ubuntu
> ubuntu-10.04.2.
> https://bugs.launchpad.net/bugs/569900
>
> Title:
> partman sometimes creates partitions such that there is ambiguity
> between whether the superblock is on the disk device or the partition
> device
>
> Status in “grub2” package in Ubuntu:
> Invalid
> Status in “partman-base” package in Ubuntu:
> Fix Released
> Status in “grub2” source package in Lucid:
> Invalid
> Status in “partman-base” source package in Lucid:
> Fix Released
> Status in “grub2” source package in Maverick:
> Invalid
> Status in “partman-base” source package in Maverick:
> Fix Released
>
> Bug description:
> Binary package hint: mdadm
>
> In a KVM, I can do this just fine:
>
> * Using 2 virtual disk images
> * Install Lucid Server amd64
> * Both disks partitioned to just one large Linux raid partition
> * RAID1 these two together, /dev/md0
> * Put / on an ext4 filesystem on /dev/md0
> * Install
>
> The above works.
>
> However, I have spent my entire weekend trying to get 10.04 on a RAID1
> of two 500GB SATA disks, without success.
>
> I partitioned them the same as above. And conducted the install.
>
> When I boot into the new system, I get dropped to an initramfs shell.
>
> I can see that /dev/md0 exists, and is in the process of resyncing.
>
> I try to "mount /dev/md0 /root" an...

Por favor, lenguaje en Español

> Date: Wed, 26 Jan 2011 17:24:36 +0000
> From: brandtc@psi5.com
> To: jlgutisu3@hotmail.com
> Subject: [Bug 569900] Re: partman sometimes creates partitions such that there	is ambiguity between whether the superblock is on the disk	device or the partition device
> 
> There are other way to get this bug too and it will haunt us back in the
> future and not only within Ubuntu
> 
> It hit me when partitionising with fdisk -u /dev/sdx but not with fdisk
> /dev/sdx because the first fits tightly to the end of the drive, the
> second leaves some space. The same happened when using GPT through
> cfdisk (not sure, its been a while).
> 
> All in all it shouldn't be wrong to use a tightly fitting partition
> table. The real problem here is that a 0.9 superblock can not be
> reliably source to either a device or a partition (or a LVM, actually I
> can imagine scenarios where its not only about drive or partition but in
> addition about other mappings like LVM, crypt, hey, even a stupidily
> placed part of a filesystem could qualify as a superblock). Until now it
> worked because scanning for superblock accidentially used a less error
> prone sequence for scanning (in fact even then the scanning usually ran
> head first into a wall but accidentially this didn't get through to the
> user)
> 
> In short: Placing vital information at the end of a bunch of sectors and
> hoping for a successful poking-around by the startup is stupid and prone
> for error.
> 
> Everyone should use front-aligned superblocks, that is version 1.1
> and/or 1.2 because every known mapping (LVM, MD, Crypt, filesystems) are
> able to preserve lead-in-gaps and deliver this vital information to the
> next layer. Not so for for lead-out-gaps.
> 
> -- 
> You received this bug notification because you are subscribed to Ubuntu
> ubuntu-10.04.2.
> https://bugs.launchpad.net/bugs/569900
> 
> Title:
>   partman sometimes creates partitions such that there is ambiguity
>   between whether the superblock is on the disk device or the partition
>   device
> 
> Status in “grub2” package in Ubuntu:
>   Invalid
> Status in “partman-base” package in Ubuntu:
>   Fix Released
> Status in “grub2” source package in Lucid:
>   Invalid
> Status in “partman-base” source package in Lucid:
>   Fix Released
> Status in “grub2” source package in Maverick:
>   Invalid
> Status in “partman-base” source package in Maverick:
>   Fix Released
> 
> Bug description:
>   Binary package hint: mdadm
> 
>   In a KVM, I can do this just fine:
> 
>    * Using 2 virtual disk images
>    * Install Lucid Server amd64
>    * Both disks partitioned to just one large Linux raid partition
>    * RAID1 these two together, /dev/md0
>    * Put / on an ext4 filesystem on /dev/md0
>    * Install
> 
>   The above works.
> 
>   However, I have spent my entire weekend trying to get 10.04 on a RAID1
>   of two 500GB SATA disks, without success.
> 
>   I partitioned them the same as above.  And conducted the install.
> 
>   When I boot into the new system, I get dropped to an initramfs shell.
> 
>   I can see that /dev/md0 exists, and is in the process of resyncing.
> 
>   I try to "mount /dev/md0 /root" and I get:
>   mount: mounting /dev/md0 on /root/ failed: Invalid argument
> 
>   Also, see something else that's odd...  My /dev/md0 looks "correct",
>   in that it's composed of /dev/sda1 and /dev/sdb1.  However, I also see
>   a /dev/md0p1, which is composed of /dev/sda and /dev/sdb (the whole
>   disks?).  Furthermore, if I go into /dev/disk/by-uuid, there is only
>   one symlink there, pointing to /dev/md0p1.  And this UUID is what is
>   in fact in grub as the root device.  That looks quite wrong.
> 
>   This looks pretty release-critical, to me, as it's affecting RAID
>   installs of the server.
> 
>   TEST CASE: The above problem should arise when attempting a RAID
>   install on any disk whose size is between 1048576*n+512 and
>   1048576*n+65535 bytes, for integer values of n.  In order to reproduce
>   this, the root filesystem should be created on a RAID array whose
>   member devices extend all the way to the end of the disk (i.e. accept
>   the default size for the partition in the installer).
> 
>   To validate this from -proposed (once available), please note that you
>   will need to use a netboot installation image and boot with apt-
>   setup/proposed=true on the kernel command line.
> 
>

Revision history for this message

Colin Watson (cjwatson) wrote on 2011-02-10:

#97

Christian: We use metadata version 1.1 by default as of Natty. You're right that it's generally a much better design. However, this is not easily backportable, at least not to Lucid, because we only added support for 1.x superblocks to GRUB in Maverick.

Ubuntu Foundations Team Bug Bot (crichton) on 2011-09-19

tags:

added: testcase

Revision history for this message

astrostl (astrostl) wrote on 2012-12-15:

#98

I think I ran into this. Have an 11.04 system, set up ways unknown. sdc1 and sdd1 were mirrored via mdadm, and sdd failed. I replaced it with an identical drive, fdisked it, and was told that it was too small when I tried to add it back into the array.

fdisk -l /dev/sdc output:

Disk /dev/sdc: 300.1 GB, 300069052416 bytes
255 heads, 63 sectors/track, 36481 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0008dc33

Device Boot Start End Blocks Id System
/dev/sdc1 1 36482 293035008 fd Linux raid autodetect

36481 cylinder disk, with a partition ending on non-existent cylinder 36482.

I could get it to start rebuilding by adding the entire /dev/sdd device rather than a partition, but it felt weird pairing a partition with a device. I dug around a bit on Google, and ended up here.

My "solution": sfdisk -d /dev/sdc|sfdisk --force /dev/sdd (using --force because sfdisk doesn't like the way sdc1 looks either).

That got me a clone of sdc's partition layout. It doesn't explain how it got there to start, though, or if I'm skating on thin ice overall.

Revision history for this message

Phillip Susi (psusi) wrote on 2012-12-15:

#99

Your issue does not seem to be related to this bug report.

Ubuntu
partman-base package

partman sometimes creates partitions such that there is ambiguity between whether the superblock is on the disk device or the partition device

Bug Description

Related branches

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to	Milestone
grub2 (Ubuntu)	Invalid	Undecided	Unassigned
Lucid	Invalid	Undecided	Unassigned
Maverick	Invalid	Undecided	Unassigned
partman-base (Ubuntu)	Fix Released	High	Colin Watson	Ubuntu ubuntu-10.10
Lucid	Fix Released	High	Colin Watson	Ubuntu ubuntu-10.04.2
Maverick	Fix Released	High	Colin Watson	Ubuntu ubuntu-10.10

Ubuntupartman-base package

partman sometimes creates partitions such that there is ambiguity between whether the superblock is on the disk device or the partition device

Bug Description

Related branches

Other bug subscribers

Remote bug watches

Ubuntu
partman-base package