Ubuntu

partman sometimes creates partitions such that there is ambiguity between whether the superblock is on the disk device or the partition device

Reported by Dustin Kirkland  on 2010-04-25
214
This bug affects 34 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
partman-base (Ubuntu)
High
Colin Watson
Lucid
High
Colin Watson
Maverick
High
Colin Watson

Bug Description

Binary package hint: mdadm

In a KVM, I can do this just fine:

 * Using 2 virtual disk images
 * Install Lucid Server amd64
 * Both disks partitioned to just one large Linux raid partition
 * RAID1 these two together, /dev/md0
 * Put / on an ext4 filesystem on /dev/md0
 * Install

The above works.

However, I have spent my entire weekend trying to get 10.04 on a RAID1 of two 500GB SATA disks, without success.

I partitioned them the same as above. And conducted the install.

When I boot into the new system, I get dropped to an initramfs shell.

I can see that /dev/md0 exists, and is in the process of resyncing.

I try to "mount /dev/md0 /root" and I get:
mount: mounting /dev/md0 on /root/ failed: Invalid argument

Also, see something else that's odd... My /dev/md0 looks "correct", in that it's composed of /dev/sda1 and /dev/sdb1. However, I also see a /dev/md0p1, which is composed of /dev/sda and /dev/sdb (the whole disks?). Furthermore, if I go into /dev/disk/by-uuid, there is only one symlink there, pointing to /dev/md0p1. And this UUID is what is in fact in grub as the root device. That looks quite wrong.

This looks pretty release-critical, to me, as it's affecting RAID installs of the server.

TEST CASE: The above problem should arise when attempting a RAID install on any disk whose size is between 1048576*n+512 and 1048576*n+65535 bytes, for integer values of n. In order to reproduce this, the root filesystem should be created on a RAID array whose member devices extend all the way to the end of the disk (i.e. accept the default size for the partition in the installer).

To validate this from -proposed (once available), please note that you will need to use a netboot installation image and boot with apt-setup/proposed=true on the kernel command line.

Changed in mdadm (Ubuntu):
importance: Undecided → High
Changed in mdadm (Ubuntu Lucid):
milestone: none → ubuntu-10.04
Dustin Kirkland  (kirkland) wrote :

Okay, a little more information ...

Looks like the installer is trying to "partition" /dev/md0 (and comes up with this /dev/md0p1). This doesn't seem to work very well at all.

Dustin Kirkland  (kirkland) wrote :

Okay, so I finally managed to get Lucid installed on a RAID1 root disk, but the procedure isn't pretty...

 * I booted a Desktop livecd
 * Popped open a terminal
 * partitioned both of my disks using fdisk, sda1 and sdb1, both 0xfd (linux raid)
 * installed mdadm in the live cd environment
 * mdadm --create /dev/md0 -n 2 -l 1 /dev/sda1 /dev/sdb1
 * mkfs.ext4 /dev/md0
 * then installed Ubuntu using the wizard
 * after the install completed, drop to a shell and chroot to /target, and apt-get install mdadm

Note that the mkfs.ext4 step seems to be the critical one... If I don't do that, and I fire up the installer, it goes and tries to partition /dev/md0, yielding a /dev/md0p1 which is unusable. This seems to be the same thing the server installer did too.

Now, I'm up and running Ubuntu Desktop with / on a RAID1. Not Server. So this still isn't ideal. But this work around hopefully shows a bit more about where the bug is.

Dustin Kirkland  (kirkland) wrote :

Also related, I'm seeing some problems with the line "DEVICE partitions" in /etc/mdadm/mdadm.conf.

It seems that it interprets both /dev/sda and /dev/sda1 as separate "partitions" containing RAID superblocks. This seems wrong, to me, as only /dev/sda1 should be in consideration. This causes mdadm to start some funny, incorrect raid devices. This might be part of the cause of this bad behavior.

Imre Gergely (cemc) wrote :

I'm just curious why it this works in a KVM environment and not on real hardware. I obviously know what the difference in in terms of "hardware" but from the POV of the raid it shouldn't be that much different, right?

Those disk are identical, there's no other sd* there to throw it off, it doesn't boot from sdb, nothing else fancy is going on? Did you try zeroing the two disks before install (like with dd if=/dev/zero of=/dev/sd{a,b}) ? Just to be sure.

ceg (ceg) wrote :

Your probably hitting several bugs.

I have also seen some differences in the geometry of virtual drives (at least I tent to size them to some human friendly number). I have seen some "unused space" things and (c)fdisk complaining.

IIRC the most usefull way with md's partitionable array feature is to use the entire disk ("sdX") as members. Then you can partition the md device, and by this partition all mirror devices at the same time. If you create the array from one member + "missing" the partition table on the disk gets used for the raid. Mdadm will create only a configurable number of device nodes for partitions though.

Does blkid detect sdX wrongly as raid?

mdadm --incremental defaults to set up arrays with partitionable device nodes if auto= is not defined otherwise in mdadm.conf. Not a bad idea in general, but mdadm is only able to do so during initramfs. (map file is missing later). Also devices are not automatically removed from the array and block readdition by incremental. See Bug #495370.

Bug #551719 enabled kernel raid autodetection disturbs udev/mdadm (initramfs & later)

ceg (ceg) wrote :

i.e. you'll get incomplete md_XpY devices if mdadm is not (re)installed (or you create a mdadm.conf manually) after creating the array since ages. Bug #252345

ceg (ceg) wrote :

(Just to reassure you. I aggree a https://wiki.ubuntu.com/ReliableRaid could be seen as release critical for a LTS release, not only for a server/workstation edition.)

Serge van Ginderachter (svg) wrote :

I just downloaded ubuntu-10.04-rc-server-amd64.iso and used it on a simple desktop machine where I added a second disk, both on SATA. First disk is around 500GB, second only 160GB.
Running the installer, manually partitioning, I made a partition of around 160GB on both disks, and left the other part of the 500GB one unused. Both 160GB partitions weren't exactly as big which left a small piece 'unused' as reported by the installer.
Further on I installed leaving most options default. One MD raid device on both 160GB partitions, formatting md0 as ext4 and putting / on it.

Reboot, no problem.

Imre Gergely (cemc) wrote :

I have an older machine, with two identical 80GB IDE harddisks. Unfortunately it is not 64bit so I grabbed the 32bit server RC iso and installed on the two, with 1-1 big raid partition (sda1/sdb1 , and RAID1, without swap, and ext4 on it.

The exact steps:

- create bootable USB stick with 10.04-rc server 32bit on it (I've created this with Karmic)
- boot the system from the USB, install
- partitioned the two disks manually for sda1/sdb1 as RAID partitions (no swap)
- created md0 as a RAID1 raid
- md0 mounted as / (root partition) with ext4 (everything else left as default)
(aswered 'Yes' when asked if I wanted to boot in case of degraded RAID)
- installed the system
- reboot, everything is fine, checked /proc/mdstat and the RAID was indeed sync'ing, but it did boot and no mention of md0pX

A small note: I didn't see ANY kind of messages during the boot, no GRUB menu, no nothing, just a cursor in the upper left, then the login prompt. A bit strange ;)

Imre Gergely (cemc) wrote :

Then I reinstalled with the 'boot degraded RAID' option set to 'No', and it's still working, no errors, no problems at booting.

Imre Gergely (cemc) wrote :

Turns out my 'old' system supports 64bit, so I've reinstalled again, this time from ubuntu-10.04-rc-server-amd64.iso. Same setup as above, and it IS working as expected. It's sync'ing but it booted without problems (with the 'boot degraded RAID' option set to 'No').

I can't seem to reproduce this bug... for now.

Imre Gergely (cemc) wrote :

Looking in /dev/disk/by-uuid I see one link pointing to ../../md0, and there's still no md0pX or anything else out of the ordinary.

Dustin Kirkland  (kirkland) wrote :

Imre and Serge-

Thanks... So this must be a disk geometry specific problem. (I actually went to Fry's and bought a new motherboard/cpu after the first few failures, thinking it was flaking out; same thing new cpu/motherboard).

My two disks are 500.1 GB SATA drives. One is Maxtor, the other Seagate.

Imre Gergely (cemc) wrote :

I'll have two 750GB SATA drives tomorrow, if it's not resolved by then, I'll try with those.

ceg (ceg) wrote :

What does blkid return before md's are set up? (maybe booting with break=top or break=premount)

ceg (ceg) wrote :

also maybe try bootoption raid=noautodetect to get the kernel detection out of the way

On Mon, Apr 26, 2010 at 03:06:55AM -0000, Dustin Kirkland wrote:
> It seems that it interprets both /dev/sda and /dev/sda1 as separate
> "partitions" containing RAID superblocks.

This happens if /dev/sda1 extends all the way to the end of the disk. I
made a change last month to prevent this:

partman-base (139ubuntu2) lucid; urgency=low

  * Always leave a small gap at the end of the disk (except on device-mapper
    devices), to avoid confusing mdadm by leaving the MD 0.90 superblock at
    the end of the disk as well as the end of a partition (LP: #527401).

 -- Colin Watson <email address hidden> Fri, 26 Mar 2010 11:32:01 +0000

Perhaps you can investigate why this apparently isn't working, based on this
history?

If you did not re-format you may also see Bug #527401. (Other install issues in the comments also).

Generally, if your disks already have some preexisting superblocks on them (even though you deleted the partition) blkid, partman, mdadm etc. can get confused. When you recreate (similar) partitions they are redetected and md devices set up.

ceg-

I found it best if I zero'd the entire disk between reinstallation attempts.

As I said on IRC, I've got the two 750GB disks and tried installing them on RAID1, but everything wen't fine, there were no problems at boot.
I've tried with KVM with 250GB disks also, no problems there either.

Can't reproduce.

HX_unbanned (linards-liepins) wrote :

As of statements made regarding unreproducable bug, I am setting this bug to Invalid. Fell free to change to Incomplete if it occurs again ... after that, there might be regression testing made.

Changed in mdadm (Ubuntu Lucid):
status: New → Invalid
Alex Kuretz (akuretz) wrote :

I'm having the same issue installing 10.04 Server on a Supermicro 6013P-T using two identical 500GB Seagate drives. The installation proceeds fine, grub says it installs, and upon reboot I get this mount error:
mount: mounting /dev/disk/by-uuid/<uuid> on /root/ failed: Invalid argument

The UUID for md0 doesn't exist in /dev/disk/by-uuid in the initramfs shell. If I boot off the desktop LiveCD and install mdadm I can do a scan and detect both md0 and md1 (swap), and mount them. md0 is resyncing when I do this. fsck reports no errors on md0. I've spent countless hours on this, very frustrating.

Dustin Kirkland  (kirkland) wrote :

Thanks for confirming, Alex.

So one of my two drives is a Seagate 500GB (the other is a Maxtor). Maybe that will help us narrow the affected geometry.

Changed in mdadm (Ubuntu):
status: Invalid → Confirmed
Changed in mdadm (Ubuntu Lucid):
status: Invalid → Confirmed
Alex Kuretz (akuretz) wrote :

I've now tried just installing on a single 500GB Seagate (different drive, was going to be a spare in my RAID1 array), and it won't boot either. This error is:
mount: mounting /dev/sda1 on /root failed: No such device

However in initramfs I'm able to mount /dev/sda1 /root with no problems. I don't understand, is it a controller issue? This box has SiL3112A controllers but I have the fakeraid disabled in the BIOS.

I'm sorry to report the same issue in a non-raid config, I hope that doesn't throw your original bug report off track. :(

Alex Kuretz (akuretz) wrote :

I zero'd the drives using dd if=/dev/zero of=/dev/sd{a,b,c} bs=4k and still the system will not boot after install completes, with the same "mounting failed: invalid argument". I also have the md1p1 device as does Dustin.

If I boot into Live CD or even Recovery mode off of the 10.04 Server CD I can see and mount the md* devices with no problems. I do see that md0 is syncing.

Imre Gergely (cemc) wrote :

@Alex: could you tell us the exact exact steps you took to install it? Like everything, every option, what you did, how you installed. Could it be that it doesn't like 500GB disks? :) I've installed on 80GB, 750GB, with RAID1, and found nothing. I have an 500GB WD too, and I can try that, but I don't have any fancy controller (I think). SATA is set to AHCI (as opposed to IDE) in the BIOS though, not sure if that matters.

Alex Kuretz (akuretz) wrote :

I believe I accept all defaults up until partitioning. At that point I created automatic partitions on all 3 x 500GB SATA drives. I designed the first partition (sda1,sdb1,sdc1) on all 3 drives as bootable. I then create md0 setting sda1 and sdb1 as the active drives and sdc1 as the spare. I click Finish, then create md1 setting sda5 and sdb5 as active and sdc5 as spare. I click Finish, then set md0 to use ext4 and be the root filesystem, then set md1 to be swap. Partitioning finishes, files are installed, I select LAMP and OpenSSH Server for the software options. Finally I say Yes to install grub to the MBR, it appears to successfully install to sda and sdb. Install completes, CD pops out, and server reboots.

Imre Gergely (cemc) wrote :

Automatic partitions? Could you try it manually, with only one disk, no RAID, one partition (sda1), ext4, no swap ?

Alex Kuretz (akuretz) wrote :

I'm at work for the day, I'll try your suggestion tonight. Note that I have tried to install to a single disk, no RAID, though I did give it a swap partition and I also did not zero the drive (described in comment #24).

Alex Kuretz (akuretz) wrote :

Imre, your suggestion worked. I zeroed the first several MB of the disk and performed the mdadm --zero-superblock as recommended by ceg in bug #527401. I did the install the same, except I manually created the partition and did not create swap.

I then tried installing on RAID1 again, the only difference being the addition of a swap partition. All other options were the same as the single drive install. It fails with the "Invalid Argument" error. :(

I'll try again with no swap partition.

Alex Kuretz (akuretz) wrote :

And with no partitions the server won't even boot, I get an "operating system not found" message.

I give up, I've spent 20 or more hours across a dozen or more installs with 4 different hard drives in this server, and no version of Ubuntu from 9.10 to 10.04 Alpha 2 to now has been able to get RAID working on my server. I've been using various versions of Linux for more than 10 years, though rarely install the OS, so I'm not brand new to this and I've got an 8.10 server sitting next to this one that installed with RAID1 just fine over a year ago. Thanks for your suggestions.

Alex Kuretz (akuretz) wrote :

Sorry, comment 31 should say "with no swap partitions".

Master Jason (jason-rsaweb) wrote :

I am having the same problem with 2 x 500GB Seagate drives. I am in the process of installing 3 machines, each using 2 drives with software raid 1. Interestingly, the raid work first time on the machines with 1TB and 250GB drives, but not on the 500GB drives. After reading this post, i see the majority of people are having problemswith 500GB drives and Imre Gergely wrote on 2010-04-27 #20 that 750GB and his 250GB drives worked fine.

So i took the 500GB drives out of the server and put 2 x 250GB drives in to test .... it worked first time.
Anybody care to explain why this bugs seems limited to 500GB drives ...

Imre Gergely (cemc) wrote :

@Jason: could you test the same without RAID ? Just a single 500GB drive ? Just to see if it's at all related to RAID or not. Just put one 500GB hdd in and install it. Then maybe take it out and install it again on the other 500GB drive, just to be sure. Delete the RAID stuff, maybe a little bit of zeroing first...

Master Jason (jason-rsaweb) wrote :

Ok .... here is an update .... i have software raid working on the 2x 500GB drive.

Each drives has:
1) 2GB Raid
2) 490GB Raid (left off the last 8+GB)
3) 8GB free

md0 (2 x 2GB Raid) formated as /etx4 /boot
md1 (2x 490GB Raid) formated as LVM

LVM has -
4GB formated as swap
486GB formated as ext4 as /

Master Jason (jason-rsaweb) wrote :

Hey Imre Gergely, sorry i missed your earlier post, single 500GB drive works fine.

Master Jason (jason-rsaweb) wrote :

Ok .... here is another update ....

We tested another machine.
Intel GT mainboard with Q9400 CPU and 4GB Kingston RAM and 2 x 500GB Seagate Drives

Each drives has:
1) 2GB Raid
2) 498GB Raid (remaining space)

md0 (2 x 2GB Raid) formated as /etx4 /boot
md1 (2x 490GB Raid) formated as LVM

LVM has -
4GB formated as swap
486GB formated as ext4 as /

The installation went fine, but when booting i was informed that md0 was running in degraded mode (only 1 drive) and that the root partion could not be found. Then i was dropped into the initramfs shell. We used the Live CD to check the raid configuration, everything was 100%, md0 had both drives and had sync'ed and md1 had both drives and continued to re-sync. We could mount the drives and everything was as it should be.

We the reinstalled the machine. During the installation we removed the existing partition and re-partioned them as below:

Each drives has:
1) 2GB Raid
2) 495GB Raid
3) 3GB Free Space (approx)

md0 (2 x 2GB Raid) formated as /etx4 /boot
md1 (2x 490GB Raid) formated as LVM

LVM has -
4GB formated as swap
486GB formated as ext4 as /

The machine booted into Ubuntu first time.

I have no idea why i cant use the full drive.
We have tested the same with Ubuntu x64 8.04, 8.10 - it is working 100%
It is not working with Ubuntu x64 9.10, 10.04RC or 10.04

Just happy to have the machines working, i am sure i wont miss 3GB

Imre Gergely (cemc) wrote :

So you're saying if you leave a bit of free space at the end of the drive, it all works fine but if you don't leave any space, it won't boot?

Master Jason (jason-rsaweb) wrote :

Hey Imre,
It would seem that is the case.
What is the min amount of space? I have no idea.
Only have the results from the above tests.

Imre Gergely (cemc) wrote :

Can you paste the output of "fdisk -l /dev/sda" and "hdparm -I /dev/sda | head -30" of the 500GB Seagate disk? You did install Ubuntu 10.04 server 64bit, right?

Alex Kuretz (akuretz) wrote :

I gave up on the Supermicro and last night successfully installed on a newer Dell 860 with 2 x 250GB drives. I will be installing on my 500GB drives tonight and will let you know if the problem occurs there. I've also got two 750GB drives I can try.

Thomas Krause (krause) wrote :

I've got the same problems on a Dell PowerEdge T110 with four 500.1 GB disks from Seagate (ST3500320NS) when trying to combine two of them to a RAID1.

I remember that there was a error in dmesg after falling back to the initramfs console about the ext4 filesystem (something with unexpected size) and a message like

md0p1 bad geometry block count exceeds size of device

(restored from my search history today).

I post the exact error message tomorrow, when I have access to the server again, if you like. I may also try some of the workarounds in order to get the server up and running tomorrow but since it's a replacement for an existing server I could delay the workarounds for some days in order to assist in debugging this issue.

Master Jason (jason-rsaweb) wrote :

Hey Guys,

I can confirm it was Ubuntu x64 10.04 (ubuntu-10.04-server-amd64.iso)

root@xxxxxxxxx:~# uname -a
Linux xxxxxxxxx 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010 x86_64 GNU/Linux

root@xxxxxxxxx:~# fdisk -l /dev/sda

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00099c9c

   Device Boot Start End Blocks Id System
/dev/sda1 * 1 244 1951744 fd Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2 244 59816 478515200 fd Linux raid autodetect

root@xxxxxxxxx:~# hdparm -I /dev/sda | head -30

/dev/sda:

ATA device, with non-removable media
        Model Number: ST3500418AS
        Serial Number: 9VMCY42L
        Firmware Revision: CC44
        Transport: Serial
Standards:
        Used: unknown (minor revision code 0x0029)
        Supported: 8 7 6 5
        Likely used: 8
Configuration:
        Logical max current
        cylinders 16383 16383
        heads 16 16
        sectors/track 63 63
        --
        CHS current addressable sectors: 16514064
        LBA user addressable sectors: 268435455
        LBA48 user addressable sectors: 976773168
        Logical/Physical Sector size: 512 bytes
        device size with M = 1024*1024: 476940 MBytes
        device size with M = 1000*1000: 500107 MBytes (500 GB)
        cache/buffer size = 16384 KBytes
        Nominal Media Rotation Rate: 7200
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, no device specific minimum

Alex Kuretz (akuretz) wrote :

32-bit Ubuntu 10.04 for me. I performed the same install on 500GB drives that I had done successfully on the Dell 860 with 250GB drives last night, and it failed with the Invalid Argument message. As strange as it sounds, the 500GB drives is the only common denominator. I zeroed and ran mdadm ---zero-superblock on the drives prior to performing the install.

Thomas Krause (krause) wrote :

The exact error message that I found using dmesg was

EXT4-fs (md1p2): bad geometry: block count 119655152 exceeds size of device (117213680 blocks)

I also had a

md1: detected capacity change from 0 to 490107502592

and

md1: p2 size 957241344 exceeds device capacity, limited to end of disk

The md1 was configured as a system root partition with Ext4 and md0 was a Swap (which seemed to work). For installation I followed the instructions at https://help.ubuntu.com/10.04/serverguide/C/advanced-installation.html

BTW, I also used the amd64 server version of Lucid.

Thomas Krause (krause) wrote :

Ok, I think found the problem and a solution/workaround:

When you ask the installer to partition the free space on the disk it will prompt you with the size of the new partition. Per default this is "500.1 GB". If you first add a smaller Swap and then a second partition this will be smaller, but still something like "482.1 GB".

I you enter by hand "500GB" (or with multiple partitions something that the sum is not bigger than 500GB) than everything works fine and as it should.

Maybe 500.1 GB *is* the right number for the 500GB drive but I somehow doubt it and blame the installer for choosing a wrong default value ;-)

midair77 (midair77) wrote :

I just tried to install 10.04 amd64 server on a box with 3 500G WD and 1 500G Hitachi Drives. I wanted to setup Raid 5 and then LVM for all the partitions. I used the latest ISO files.

Raid 5 for sda1,sdb1,sdc1,sdd1
LVM named system: /, /boot, /home, /var, /tmp, /usr, /usr/local, /opt

After installation and the box rebooted, I got error and initramfs prompt.

ALERT!! /dev/mapper/system-root does not exist. Dropping to a shell!

cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-2.6.32-21-server root=/dev/mapper/system-root ro quiet

cat /proc/mdstat and I saw that md0 raid 5 with sd[abcd] is resyncing...

ls /dev/mapper
control
ls /dev/md*
/dev/md0p1 /dev/md0

As you can see there is no system-root and etc under /dev/mapper.

I saw that in /etc/mdadm/mdadm.conf
ARRAY /dev/md0 level=raid5 num-devices=4 UUID=........

ls /dev/disk/by-*
/dev/disk/by-id/ /dev/disk/by-path

As you can see there is no /dev/disk/by-uuid in /dev/disk.

Previously, I encountered similar problems when installed on a system with 2 250GB WD disks (1 big Raid1 and then LVM all partitions). I then tried to set 3 Raid 1 small partitions for swap, / and /boot and 1 big Raid1 on LVM for other partitions. With this set up, 10.04 amd64 server was able to successfully booted up.

This is a very serious bug considering that a lot of people will be running server edition with some types of RAIDx and the installation failed to correctly set these up. This bug is a show stopper and makes users pondering the quality of each Ubuntu releases.

Manny Vindiola (serialorder) wrote :

I can also confirm for Ubuntu 10.04 Lucid Server (64 bit)

I tried installing with a raid 1 on two WD Black 500GB hard drives.
I get the no init found error and am dropped to a busybox shell

I have three partitions one for /boot, one for /root, and one for /swap

Like @Thomas I also get this type error

md1: detected capacity change from 0 to 490107502592

and

md1: p2 size 957241344 exceeds device capacity, limited to end of disk

and like @midair77 I also get this type of error
AERT!! /dev/md1 does not exist. Dropping to a shell!

cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-2.6.32-21-server root=/dev/md1 ro quiet

I just took a look at my partition table and I think I noticed something that may be contributing to the problem:

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

   Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 fd Linux raid autodetect
/dev/sda2 14 59829 480468992 fd Linux raid autodetect
/dev/sda3 59829 60802 7813120 fd Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

   Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 fd Linux raid autodetect
/dev/sdb2 14 59829 480468992 fd Linux raid autodetect
/dev/sdb3 59829 60802 7813120 fd Linux raid autodetect

If you notice both HDDs have a total of 60801 cylinders and on both hard drives the end cylinder of the 3rd partition is 60802 which is greater than the size of the disk.

Both of these partitions were created with the install partitioner I am resinstalling now with partitions I create myself with (s)fdisk. I will let you know if that works.

Manny Vindiola (serialorder) wrote :

I can confirm that creating the partitions by hand worked and the system is now able to boot. The current disk configuration is:

~# fdisk -l
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

   Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 fd Linux raid autodetect
/dev/sda2 14 59829 480468992 fd Linux raid autodetect
/dev/sda3 59829 60801 7813120 fd Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a2420

   Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 fd Linux raid autodetect
/dev/sdb2 14 59829 480468992 fd Linux raid autodetect
/dev/sdb3 59829 60801 7813120 fd Linux raid autodetect

Dustin Kirkland  (kirkland) wrote :

Colin-

We have a few people confirming this now. Any chance you could take another look?

Changed in mdadm (Ubuntu):
assignee: nobody → Colin Watson (cjwatson)
Vladimir Smolensky (arizal) wrote :

Same problems here and manually creating raids with mdadm seems to lead to same problems.

Looks like the raid driver thinks the last raid's superblock is actually the superblock for the entire disk!
In our case, we have /dev/md3 consisting of sda6, sdb6. The array was made manually with 'mdadm --create' AFTER installing the machine, not from installer! I've done this because partitioning the disk right from the installer always leaded to broken arrays after reboot.

After creating the array in the next boot we get md3 device assembled from sda and sdb and several other arrays created from
md3p1 md3p2 ... so on... a real mess
checking with 'mdadm --examine' shows that the metainfo for sda6, sdb6 is exactly the same as the one for sda, sdb! same components and UUID

If I remember right, the kernel should only auto-assemble arrays only from partitions of type FD ( Linux raid), so its unclear why it will assemble the array from sda and sdb!!

Vladimir Smolensky (arizal) wrote :

And yes, our disks are 500Gb too

=== START OF INFORMATION SECTION ===
Device Model: WDC WD5001AALS-00E3A0
Serial Number: WD-WCATR0460005
Firmware Version: 05.01D05
User Capacity: 500,107,862,016 bytes

I had the same problem. An Ubuntu 10.04 installation with RAID1 on 2 500GB harddisks will not boot!

Thanks to the hint from Manny I created the raid-partitions with the installer manually and even let some space free at the end of each disk and it worked fine!

Vladimir Smolensky (arizal) wrote :

Okay, my partition also seems to end at cylinder 60802 and the disk has 60801 cylinders...
I made the last partition by hand but its logical and the extended partition it is located on was made by the installer ending at 60802...

Changed in mdadm (Ubuntu Lucid):
assignee: nobody → Colin Watson (cjwatson)
Ky Weichel (kweichel) wrote :

I can confirm this issue as well, on a Dell PowerEdge R210 with two 500GB drives.

Device Model: WDC WD5002ABYS-18B1B0
Serial Number: WD-WCASYC640636
Firmware Version: 02.03B04
User Capacity: 500,107,862,016 bytes

Device Model: WDC WD5002ABYS-18B1B0
Serial Number: WD-WCASYC631505
Firmware Version: 02.03B04
User Capacity: 500,107,862,016 bytes

The boot problem was the same (dumped to initramfs prompt on reboot after install). I found the same cylinder 60802 anomaly with the partition tables when I used the installer's partitioner to create them.

My partitions consisted of a 484GB main partition (marked bootable) and a 16GB swap partition on each disk, all set to type FD (Linux RAID autodetect) and added to /dev/md0 and /dev/md1 respectively. md0 was then formatted ext4 and mounted to /, and md1 was set to swap.

To workaround the issue, I created my partitions with fdisk instead (alt-switched to another tty during partitioning step) and then created my RAID sets in the installer's partitioner. As a result I could still create partitions that filled the whole drive. This worked and the system now boots properly.

My partition table looks like this:

root@belair-auto2:/# fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000b35fa

   Device Boot Start End Blocks Id System
/dev/sda1 * 1 58843 472654848 fd Linux raid autodetect
/dev/sda2 58843 60801 15728160+ fd Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c1204

   Device Boot Start End Blocks Id System
/dev/sdb1 * 1 58843 472654848 fd Linux raid autodetect
/dev/sdb2 58843 60801 15728160+ fd Linux raid autodetect

(kweichel...md devices snipped)

Mike Perry (mike.perry) wrote :

Wow, I thought I was going crazy until I saw this bug. Here is my setup:

Two identical 500G Seagate drives. I sliced each of them up into 3 partitions using the alternative installer.

sda1, sdb2 - 1G - raid1 as md0 for boot
sda2, sdb2 - 5G - swap
sda3, sdb3 - remaining space as prompted - raid1 as md1 for LVM

The install appears to go fine, in fact, I didn't really notice there was a problem until I examined my boot logs and found that my swap on sdb2 wasn't found. When inspecting the drives I found that I had sda1,2,3, but there were no partitions on sdb. Instead, I have md1p1,p2,p3.

I'm surprised the system was even working. I experimented with a few different configurations and got similar results.

Mike Perry (mike.perry) wrote :

I was able to get raid1 working directly from the install by following the advice in this thread. Specifically, I manually paritioned my drives with fdisk, and did not have any raid partition fill up the entire disk.

Ky Weichel (kweichel) wrote :

Mike, if you're using fdisk, you actually _can_ have your partitions fill the whole disk. That's what I was trying to convey in my post.

I also can confirm that partitioning drives manually from another tty during installation solves the problem.

Lars Steinke (lss) wrote :

While I encountered a similar problem when upgrading from 8.04 LTS, my observations might possibly prove helpful for fresh installs as well:
- 10.04 dropped me to the initramfs shell with: "ALERT!! /dev/md0 does not exist. Dropping to a shell!"
- I was then able to continue booting after issuing "mdadm --auto-detect; exit"
- A subsequent "grub-install /dev/md0" fixed the initramsfs problem with /dev/md0 for me.
Please note this is still grub 0.97, as that doesn't seem to be upgraded automatically...

Stu Thompson (stu-comp) wrote :

I've also had the same issue, but with smaller disks: WD RE3 250GB (WD2502ABYS)

Using the suggested sizes when creating the md* devices (a single ext3 / partition + one swap partition) resulted with the "Invalid argument" error message and the initramfs prompt.

Manually defining the size to be slightly less than the suggested size worked like a charm.

Stu

RichardN (richardn) wrote :

I have this exact same issue. 500.1GB seagate disks and 64bit 10.04 server. Have tried installing in a couple of different configurations with no luck. The first time I got the "mount: mounting /dev/disk/by-uuid/<uuid> on /root/ failed: Invalid argument" message and I noticed that gub-mkconfig was detecting a uuid that as far as I can tell did not exist on the system. So I changed /etc/default/grub so that I didn't pass uuid as a parameter. After that it wouldn't mount /proc.
The system works fine when installed on only one disk. I'll have one or two more goes at installing using some of the ideas here. If it still doesn't work I'll have to give up and use debian.

Same issue here. I have 2 500 GB WD disks and was trying to install 10.04 server amd64. When partitioning with the installer partitioner, the ending cilinder of the last partition was set to 60802 while the disks physically only have 60801.

Manually partitioning with fdisk in another tty solved the issue for me.

Lobo (alex-loffler) wrote :
Download full text (3.6 KiB)

Confirmed - after multiple install attempts resulting in /dev/md definition weirdness and a (initramfs) prompt, I finally have 10.04 installed on RAID1 /dev/md partitions. Tried the repartition manually just before installing - no joy. I wiped the disks clean then used the installer and manually set the last partition to be ~1GB smaller than the partitioner was suggesting and _bang_ it worked flawlessly. Here are some details about the setup - you guessed it 2 x 500GB Seagate disks...:

lobo@test:~$ uname -a
Linux test 2.6.32-23-generic #37-Ubuntu SMP Fri Jun 11 07:54:58 UTC 2010 i686 GNU/Linux

/dev/sda:

 Model=ST3500320AS, FwRev=SD15, SerialNo=9QM7HLTS
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown: ATA/ATAPI-4,5,6,7

/dev/sdb:

 Model=ST3500320AS, FwRev=SD15, SerialNo=9QM63DNQ
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown: ATA/ATAPI-4,5,6,7

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000693db

   Device Boot Start End Blocks Id System
/dev/sda1 * 1 132 1060258+ fd Linux raid autodetect
/dev/sda2 133 394 2104515 fd Linux raid autodetect
/dev/sda3 395 60575 483397632 fd Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a1c27

   Device Boot Start End Blocks Id System
/dev/sdb1 * 1 132 1060258+ fd Linux raid autodetect
/dev/sdb2 133 394 2104515 fd Linux raid autodetect
/dev/sdb3 395 60575 483397632 fd Linux raid autodetect

lobo@test:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sda3[0] sdb3[1]
      483397568 blocks [2/2] [UU]
      [==================>..] resync = 94.6% (457681664/483397568) finish=10.1min ...

Read more...

Steve Langasek (vorlon) on 2010-07-05
Changed in mdadm (Ubuntu Lucid):
milestone: ubuntu-10.04 → none
Changed in mdadm (Ubuntu):
milestone: ubuntu-10.04 → none
ceg (ceg) wrote :

May this be related? (500GB disk): Bug #599515 mdadm misdetects disks instead of partition

Mattias Toom (matthew-toom) wrote :

Hi all,

I can confirm this bug and was lucky enough to find the bug report for it. I installed Ubuntu Server 10.04 64-bit on a machine with multiple RAID1 configurations (mdadm) working and was attempting to build a RAID1 with 2x 500gb drives for the OS.

Couldn't figure out why a fresh install of the OS wouldn't work and was getting worried since I have mdadm running a 2x2tb, and a pair of 2x1tb RAID1's. I thought the error might be relating to all of those RAID configurations confusing the installer somehow. It wasn't.

I got the error messages mentioned above.

It looks like there was an error partitioning the drives; after carefully manually partitioning the drives (USING the installer, I suppose this could have been done with fdisk) I left 100 megabytes of free space as per the post of Thomas Krause. The other change I made was NO to "do you want the system to boot if one of the OS drives is degraded?". I think it was the partitioning change that fixed the problem.

Anyways, after the 2nd install Ubuntu Server boots fine, automatically recognizes all my arrays and life is beautiful.

M. Toom BSc(comp.sci)

Mattias Toom (matthew-toom) wrote :

Further, to add some specificity, the free space I left was at the end of the drive. So, I did about 8 gb for swap, then the remaining size of the disk minus 0.1 gb for the bootable ext4 partition.

Thanks to all for creating this thread (and contributing to it) to document the bug, it was very useful for me and I was able to fix the issue fairly quickly after finding it.

Unlogic (unlogic-unlogic) wrote :

I can also confirm this bug. I recently installed two servers both with two 500GB drives configured using raid 1 with Linux software raid.

The installation goes smooth but when I reboot I end up with a busybox prompt and a nonworking raid setup.

If I partition the drives and create the raid using a Mandriva 2010.1 disc and then install Ubuntu 10.04 on top pf those partitions it works just fine.

I'm very surprised to see that the mighty Ubuntu distro is having such serious bugs, I hope this gets sorted out quickly!

Btw. Here is a whole thread at the forums discussing this issue: http://ubuntuforums.org/showthread.php?t=1474950

Plutocrat (plutocrat) wrote :

Confirmed here as well. I couldn't believe that such a fundamental bug would be released, but really, yes, my RAID system is unbootable after using the installer. Two and a half days wasted.
I've got an IBM 3200 server with four 500Gb disks. My partitioning scheme is as follows.
On each disk:
/boot 1Gb
/ 15Gb
swap 2Gb
/home 482Gb
Then I have a
md0 RAID 1 on the four boot partitions (sd[abcd]1),
md1 RAID 5 across the / partitions (sd[abcd]2) ,
md2 RAID5 across the /home partittions (sd[abcd]4)

The /etc/mdadm/mdadm.conf file looks correct before I reboot.

After install I get the initramfs prompt.

doing cat /proc/mdstat tells me only the md2 array is detected and its rebuilding.
mdadm --detail /dev/md2 tells me that it is compose of sda, sdb, sdc, sdd ie the WHOLE disks

I can do mdadm --stop /dev/md2
and then.
mdadm --assemble /dev/md0 /dev/sd[abcd]1
mdadm --assemble /dev/md1 /dev/sd[abcd]2
mdadm --assemble /dev/md2 /dev/sd[abcd]4

After this the md2 only adds 3 of the 4 disks, but I can now boot by typing exit to get out of the initramfs.

I've tried all my tricks to get this config to stick but apparently every reboot I have to manually stop the wrong array and manually assemble them all again.

This is the amd/64 iso for ubuntu server 10.04. I also had the same problem with the 386 version.

I've tried vaping the partition tables, reformatting the drives, and zeroing the superblock and none of these fix it. Two and a half days. Is the bug in mdadm, the partitioner, or what?

Plutocrat (plutocrat) wrote :

PS if anyone can suggest ways I can get the manually assembled arrays to 'stick' I'd be grateful. I'm not sure I could go through another install.

Guido Scalise (guido-scalise) wrote :

After an entire day lost trying to install 10.04.1 LTS on a brand new Dell PowerEdge R210 with two 500Gb drives, I found this bug report and followed Thomas Krause workaround (creating the last partition with 100Mb less, thus leaving 100Mb unused) and was finally able to boot.

I can't believe such a gross bug went released. An entire day of work lost.

It should also be noted that during installation, grub offers to automatically install on one of the disks' MBR (/dev/sda in my case), when the correct thing to do would be to install it on both /dev/sda and /dev/sdb

Ky Weichel (kweichel) wrote :

The bug is in the installer's partitioner, NOT in mdadm. As previously stated, the partitions are being created by the installer with an end cylinder number that is one greater than the actual end of the disk.

All you have to do is create your _partitions_ somewhere other than the Ubuntu Server Installer. You can use a GParted disc, boot to your favourite Live CD and fdisk them, or whatever is easiest for you. Just make sure you set their type to Linux Raid Autodetect (that's type "FD" in fdisk).

That way you can even make them take up your whole disk and you don't have to waste 100MB of space.

You can then fire up the Ubuntu Server Installer and create your RAID sets on the partitions you made elsewhere.

Unlogic (unlogic-unlogic) wrote :

I created a script that wipes my disks of any superblocks and partitions and then recreates the partitions again (using a stored partition table with sfdisk) and the raid devices using mdadm.

If I start the Ubuntu installer, switch to another console and run the script and then proceed to install Ubuntu on the created partitions this bug occurs and the system gets stuck on first boot.

If I boot up another Linux install disc, run the script and then start the Ubuntu installer again and install Ubuntu on the created partitions all works fine.

So I'm not entirely sure that the bug is in the installer's partitioner. I my case above I didn't touch the partitioner and used the following partition table with sfdisk instead:

# partition table of /dev/sda
unit: sectors

/dev/sda1 : start= 2048, size= 19529728, Id=fd, bootable
/dev/sda2 : start= 19531776, size=947265536, Id=fd
/dev/sda3 : start=966797312, size= 9975808, Id=fd
/dev/sda4 : start= 0, size= 0, Id= 0

Plutocrat (plutocrat) wrote :

@Ky Thanks for the feedback. I was a little jaded when I wrote my post. I bit the bullet on Monday morning and went through the install again. First of all I booted into a Gparted Live CD and wiped all RAID Arrays and partitions. I rebooted into it again to check, and had to remove a new RAID array md127? which had been created. After that the disks were clear.

I then ran the installer, creating my partitions , but leaving a space at the end of the disk, after the last partition. In my case this was /dev/sd[abcd]4, and I left 1Gb (although I gather less will also work). The RAID arrays assembled OK and I could reboot after install. So, just confirming the workaround ...

Plutocrat (plutocrat) wrote :

@Guido - I think grub will offer to install on all partitions marked as 'boot'. In my case this was all four /dev/sd[abcd]1 partitions, so it offered to install on all of them.

Keith Cornwell (apex-thing2) wrote :

I opened bug #605720 for what I think is the root cause of this problem. I don't know how the installer does partitioning but it exhibits the same behavior.
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/605720

Doug Jones (djsdl) wrote :

This bit me too. Two 500GB drives, RAID1, using 10.04.1 alternate 386 installer.

Reading through all these comments, and those on similar (possibly related) bugs, it seems like this is caused by an arithmetic error in some code that figures out where things ought to be on the disk.

Suddenly I am reminded of another arithmetic error that cropped up in gparted recently, relating to the switchover from align-to-cylinder to align-to-megabyte. Didn't the default partition alignment method just change in Lucid?

Very suspicious...

Thierry Carrez (ttx) on 2010-09-03
tags: added: server-mro
Jean-Luc Boss (jlb74) wrote :

Had the same error with a 64 bit install on a dell optiplex 780 with 2 500Gb disks...
the rounding up the ".1" workaround worked for me

D_A_N_K_O (russian-robotic) wrote :

A have the same problem with 2x500Gb Seagate HDD . I can`t create raid 1
Ubuntu Server 10.04 with all updates

/dev/sda:

ATA device, with non-removable media
 Model Number: GB0500EAFJH
 Serial Number: 9QM8L22V
 Firmware Revision: HPG6
Standards:
 Used: ATA/ATAPI-7 T13 1532D revision 4a
 Supported: 7 6 5 4 & some of 8
Configuration:
 Logical max current
 cylinders 16383 16383
 heads 16 16
 sectors/track 63 63
 --
 CHS current addressable sectors: 16514064
 LBA user addressable sectors: 268435455
 LBA48 user addressable sectors: 976773168
 Logical Sector size: 512 bytes
 Physical Sector size: 512 bytes
 device size with M = 1024*1024: 476940 MBytes
 device size with M = 1000*1000: 500107 MBytes (500 GB)
 cache/buffer size = unknown
 Nominal Media Rotation Rate: 7200
Capabilities:
 LBA, IORDY(can be disabled)
 Queue depth: 32
 Standby timer values: spec'd by Standard, no device specific minimum
 R/W multiple sector transfer: Max = 16 Current = ?

/dev/sda:

ATA device, with non-removable media
 Model Number: GB0500EAFJH
 Serial Number: 9QM8L22V
 Firmware Revision: HPG6
Standards:
 Used: ATA/ATAPI-7 T13 1532D revision 4a
 Supported: 7 6 5 4 & some of 8
Configuration:
 Logical max current
 cylinders 16383 16383
 heads 16 16
 sectors/track 63 63
 --
 CHS current addressable sectors: 16514064
 LBA user addressable sectors: 268435455
 LBA48 user addressable sectors: 976773168
 Logical Sector size: 512 bytes
 Physical Sector size: 512 bytes
 device size with M = 1024*1024: 476940 MBytes
 device size with M = 1000*1000: 500107 MBytes (500 GB)
 cache/buffer size = unknown
 Nominal Media Rotation Rate: 7200
Capabilities:
 LBA, IORDY(can be disabled)
 Queue depth: 32
 Standby timer values: spec'd by Standard, no device specific minimum
 R/W multiple sector transfer: Max = 16 Current = ?

Alexandr (olexandr-dmitriev) wrote :

The sames issue with me:10.04.1 - 2 500GB WD disks, RAID1 and Busybox after install. Manual partitioning reducing the last one by 100 MB helped... Very sad.

Luigi Messina (grimmo) wrote :
Download full text (4.2 KiB)

I'm having very similar simptoms after installing 10.04.1 from scratch
with two 500Gb disks (WD and ST).
The system installs and boots correctly if the raid1 array is created manually
from CLI before partitions detection.
But after some hours of uptime, errors start appearing in logs and the array becomes degraded:

Sep 19 13:36:19 deepthought kernel: [ 278.248022] ata3.00: qc timeout (cmd 0x27)
Sep 19 13:36:19 deepthought kernel: [ 278.248027] ata3.00: failed to read native max address (err_mask=0x4)
Sep 19 13:36:19 deepthought kernel: [ 278.248033] ata3.00: disabled
Sep 19 13:36:19 deepthought kernel: [ 278.248039] ata3.00: device reported invalid CHS sector 0
Sep 19 13:36:19 deepthought kernel: [ 278.248049] ata3: hard resetting link
Sep 19 13:36:20 deepthought kernel: [ 279.128035] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep 19 13:36:20 deepthought kernel: [ 279.128048] ata3: EH complete
Sep 19 13:36:20 deepthought kernel: [ 279.128057] sd 2:0:0:0: [sdb] Unhandled error code
Sep 19 13:36:20 deepthought kernel: [ 279.128059] sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 19 13:36:20 deepthought kernel: [ 279.128062] sd 2:0:0:0: [sdb] CDB: Write(10): 2a 00 3a 38 5f 88 00 00 08 00
Sep 19 13:36:20 deepthought kernel: [ 279.128082] md: super_written gets error=-5, uptodate=0
Sep 19 13:36:20 deepthought kernel: [ 279.128105] sd 2:0:0:0: [sdb] Unhandled error code
Sep 19 13:36:20 deepthought kernel: [ 279.128106] sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Sep 19 13:36:20 deepthought kernel: [ 279.128109] sd 2:0:0:0: [sdb] CDB: Read(10): 28 00 06 a2 3c 80 00 00 20 00
Sep 19 13:36:20 deepthought kernel: [ 279.205366] RAID1 conf printout:
Sep 19 13:36:20 deepthought kernel: [ 279.205369] --- wd:1 rd:2
Sep 19 13:36:20 deepthought kernel: [ 279.205371] disk 0, wo:0, o:1, dev:sda
Sep 19 13:36:20 deepthought kernel: [ 279.205373] disk 1, wo:1, o:0, dev:sdb
Sep 19 13:36:20 deepthought kernel: [ 279.212009] RAID1 conf printout:
Sep 19 13:36:20 deepthought kernel: [ 279.212011] --- wd:1 rd:2
Sep 19 13:36:20 deepthought kernel: [ 279.212013] disk 0, wo:0, o:1, dev:sda

also in dmesg this message is present at every boot:

[ 3.022033] md1: p5 size 976269312 exceeds device capacity, limited to end of disk

These are the partitions as seen from sfdisk:

~$ sudo sfdisk -l /dev/sda

Disk /dev/sda: 30401 cylinders, 255 heads, 63 sectors/track
Warning: The partition table looks like it was made
  for C/H/S=*/81/63 (instead of 30401/255/63).
For this listing I'll assume that geometry.
Units = cylinders of 2612736 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start End #cyls #blocks Id System
/dev/sda1 0+ 95707- 95708- 244197560 83 Linux
  end: (c,h,s) expected (1023,80,63) found (705,80,63)
/dev/sda2 0 - 0 0 0 Empty
/dev/sda3 0 - 0 0 0 Empty
/dev/sda4 0 - 0 0 0 Empty

~$ sudo sfdisk -l /dev/sdb

Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS a...

Read more...

Luigi Messina (grimmo) wrote :

pardon, my system has decided to switch device names when the array became degraded, the correct fdisk output for the other disk is this one:

Disk /dev/sdc: 60801 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start End #cyls #blocks Id System
/dev/sdc1 * 0+ 31- 32- 249856 fd Linux raid autodetect
/dev/sdc2 31+ 60801- 60770- 488134657 5 Extended
/dev/sdc3 0 - 0 0 0 Empty
/dev/sdc4 0 - 0 0 0 Empty
/dev/sdc5 31+ 60801- 60770- 488134656 fd Linux raid autodetect

Colin Watson (cjwatson) wrote :

I finally got back to this bug and figured out what was going on. It took a while ...

A few people suggested that what was happening was that the partitioner was creating partitions that extended beyond the end of the disk. That wasn't actually quite right if you looked at the logs in detail and did the arithmetic; they were entirely within the disk, just extending onto the last (incomplete) cylinder, and there's nothing wrong with that in itself. However, there were log messages indicating that the md layer in the kernel thought that an md device was overflowing the disk, and this pointed me in the right direction.

When I tried to fix this bug before, I observed that what was happening was that mdadm was getting confused between /dev/sda and /dev/sda1 (or whatever the last partition happened to be). Since the 0.90 metadata format stores the superblock at the end of the device, there's obvious potential for confusion between a partition extending all the way to the end of the disk and the disk device itself. I fixed this, or so I thought, by constraining the installer's partitioner to never use the last sector of the disk. This fixed the problem in my tests.

Unfortunately, I apparently didn't quite do enough research on exactly what was happening. When I came back to this bug, I read the md(4) manual page, and found this:

  The common format - known as version 0.90 - has a superblock that is 4K long
  and is written into a 64K aligned block that starts at least 64K and less
  than 128K from the end of the device (i.e. to get the address of the superblock round the size of the device down to a multiple of 64K and then subtract 64K).

(The 1.0 superblock format is similar, but is never more than 12K from the end of the device, so a fix for 0.90 will fix 1.0 too. 1.1 and 1.2 store their superblocks at or near the start of the device, and do not suffer from this problem.)

So, if you do the mathematics based on partman's current constraints, the result is that Ubuntu will currently get this wrong for any disk whose size is an exact multiple of 1048576 bytes plus any number between 512 and 65536. The 500GB disks common among commenters on this bug report are, according to the logs, 500107862016 bytes long, which is 476940 * 1048576 + 24576. I could never reproduce this in KVM before because my habit is to create disk images which are an exact number of megabytes (I usually just say '10G' or thereabouts), and such an image would never encounter this bug thanks to my previous attempted fix of avoiding the last sector.

The proper fix, then, is for partman to round the disk size down to 64K, subtract one further sector, and avoid any sectors after that.

Colin Watson (cjwatson) on 2010-09-28
Changed in grub2 (Ubuntu):
status: New → Invalid
Changed in grub2 (Ubuntu Lucid):
status: New → Invalid
affects: mdadm (Ubuntu) → partman-base (Ubuntu)
Changed in partman-base (Ubuntu):
status: Confirmed → In Progress
Colin Watson (cjwatson) wrote :

I'll upload a fix for 10.04.2 once we've tested this for 10.10.

summary: - mount: mounting /dev/md0 on /root/ failed: Invalid argument
+ partman sometimes creates partitions such that there is ambiguity
+ between whether the superblock is on the disk device or the partition
+ device
description: updated
Changed in partman-base (Ubuntu):
status: In Progress → Fix Committed
Changed in partman-base (Ubuntu Lucid):
status: Confirmed → Triaged
status: Triaged → Fix Committed
Colin Watson (cjwatson) on 2010-09-28
Changed in partman-base (Ubuntu Maverick):
milestone: none → ubuntu-10.10
Changed in partman-base (Ubuntu Lucid):
milestone: none → ubuntu-10.04.2

I struggled a lot, glad to read Colin Watson post. I have a Dell R210, with a S100 software raid controller and 2 500 GB hard disks. I thought the problem was coming from the S100 card all along, finally read all the explanations. Thanks a lot M. Watson !

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package partman-base - 141ubuntu2

---------------
partman-base (141ubuntu2) maverick; urgency=low

  * Expand the small gap we leave at the end of the disk to avoid MD
    superblock ambiguity so that it correctly covers the region where
    ambiguity might arise. The previous gap was insufficient on disks that
    were between 512 and 65535 bytes larger than a multiple of 1048576 bytes
    (LP: #569900).
 -- Colin Watson <email address hidden> Tue, 28 Sep 2010 21:17:07 +0100

Changed in partman-base (Ubuntu Maverick):
status: Fix Committed → Fix Released
tombert (tombert.live) wrote :

I would like to have you a look at this one:
https://bugs.launchpad.net/unetbootin/+bug/661820

Curtis Dutton (curtdutt) wrote :

I was able to fix this problem after the failed install and avoid the re-install. Ubuntu Server 10.10

I also have the 500GB drives with a RAID1.

My partitions were done as default with /dev/sda1 and /dev/sdb1 as the root partitions and /dev/sda5 and /dev/sdb5 as the swap partitions.

I shrunk the swap partitions (/dev/sda5 and /dev/sdb5) by 2 "Units" using fdisk and then zeroed the raid superblocks.

The steps I took.

Load up the rescue disk and mount the installer root. In my case it was seeing md5 and md5p1 as my raid devices.

1. run "mdadm --stop /dev/md5"
2. fdisk /dev/sda
3. delete partition 5 (/dev/sd5)
4. create a new logical partition 5
5. use the default start
6. used End - 2 (in my case 60802 - 2 = 60800) as the End
7. change type to "fd" for "Linux raid autodetect"
8-14. reapeat steps 2 - 7 with /dev/sdb
15. run mdadm --zero-superblock /dev/sda
16. run mdadm --zero-superblock /dev/sdb
17. reboot

After getting this to work, and realizing that a block is about 8MB, I realize this solution loses out on 16MB on my disks. I'm sure a larger end value might work as well but who cares. It boots now.

similar bug here: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/612224

well i have to say i have 500GB HDD as well, but i think thats just by accident.

as you can read here:
https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/612224

i solved the problem of not beeing able to boot, by not putting SWAP into RAID 1.

greetings
scrapper

Colin Watson (cjwatson) on 2010-12-06
Changed in partman-base (Ubuntu Lucid):
status: Fix Committed → In Progress
description: updated

Accepted partman-base into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in partman-base (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: added: verification-needed
Martin Pitt (pitti) wrote :

Any testers of the lucid-proposed package? Please note that we have daily built lucid CDs which include -proposed, which allow you to test this. Thanks!

Imre Gergely (cemc) wrote :

Did the test on Lucid like this:

- created two identical qcow2 images:
    kvm-img create -f qcow2 disk1.qcow2 1048576512
    kvm-img create -f qcow2 disk2.qcow2 1048576512

disk image sizes 1048576512 bytes = 1048576*1000+512 (formula from the test case)

- first installed Lucid server i386 to reproduce the problem:
    kvm -m 512 -cdrom /store/Kits/isos/lucid/ubuntu-10.04.1-server-i386.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
- created RAID1 on sda1 and sdb1 (two partitions which stretch to the end of each disk)
- after reboot got the (initramfs) prompt and the errors

- recreated the images, reinstalled the latest Lucid:
    kvm -m 512 -cdrom /store/Kits/isos/lucid/mini.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
- mini.iso taken from http://archive.ubuntu.com/ubuntu/dists/lucid/main/installer-i386/current/images/netboot/mini.iso , and booted with "cli apt-setup/proposed=true" parameters
- after install the guest booted just fine, RAID members were sync'ed, no problems.

Looks like the fix is working.

Download full text (4.0 KiB)

Only lenguage spanish

> Date: Tue, 25 Jan 2011 21:06:44 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 569900] Re: partman sometimes creates partitions such that there is ambiguity between whether the superblock is on the disk device or the partition device
>
> Did the test on Lucid like this:
>
> - created two identical qcow2 images:
> kvm-img create -f qcow2 disk1.qcow2 1048576512
> kvm-img create -f qcow2 disk2.qcow2 1048576512
>
> disk image sizes 1048576512 bytes = 1048576*1000+512 (formula from the
> test case)
>
> - first installed Lucid server i386 to reproduce the problem:
> kvm -m 512 -cdrom /store/Kits/isos/lucid/ubuntu-10.04.1-server-i386.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
> - created RAID1 on sda1 and sdb1 (two partitions which stretch to the end of each disk)
> - after reboot got the (initramfs) prompt and the errors
>
> - recreated the images, reinstalled the latest Lucid:
> kvm -m 512 -cdrom /store/Kits/isos/lucid/mini.iso -hda disk1.qcow2 -hdb disk2.qcow2 -vnc 172.16.21.1:1 -cpu qemu32
> - mini.iso taken from http://archive.ubuntu.com/ubuntu/dists/lucid/main/installer-i386/current/images/netboot/mini.iso , and booted with "cli apt-setup/proposed=true" parameters
> - after install the guest booted just fine, RAID members were sync'ed, no problems.
>
> Looks like the fix is working.
>
> --
> You received this bug notification because you are subscribed to Ubuntu
> ubuntu-10.04.2.
> https://bugs.launchpad.net/bugs/569900
>
> Title:
> partman sometimes creates partitions such that there is ambiguity
> between whether the superblock is on the disk device or the partition
> device
>
> Status in “grub2” package in Ubuntu:
> Invalid
> Status in “partman-base” package in Ubuntu:
> Fix Released
> Status in “grub2” source package in Lucid:
> Invalid
> Status in “partman-base” source package in Lucid:
> Fix Committed
> Status in “grub2” source package in Maverick:
> Invalid
> Status in “partman-base” source package in Maverick:
> Fix Released
>
> Bug description:
> Binary package hint: mdadm
>
> In a KVM, I can do this just fine:
>
> * Using 2 virtual disk images
> * Install Lucid Server amd64
> * Both disks partitioned to just one large Linux raid partition
> * RAID1 these two together, /dev/md0
> * Put / on an ext4 filesystem on /dev/md0
> * Install
>
> The above works.
>
> However, I have spent my entire weekend trying to get 10.04 on a RAID1
> of two 500GB SATA disks, without success.
>
> I partitioned them the same as above. And conducted the install.
>
> When I boot into the new system, I get dropped to an initramfs shell.
>
> I can see that /dev/md0 exists, and is in the process of resyncing.
>
> I try to "mount /dev/md0 /root" and I get:
> mount: mounting /dev/md0 on /root/ failed: Invalid argument
>
> Also, see something else that's odd... My /dev/md0 looks "correct",
> in that it's composed of /dev/sda1 and /dev/sdb1. However, I also see
> a /dev/md0p1, which is composed of /dev/sda and /dev/sdb (the whole
> disks?). Furthermore, if I go into /dev/disk/b...

Read more...

Martin Pitt (pitti) on 2011-01-26
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package partman-base - 139ubuntu7

---------------
partman-base (139ubuntu7) lucid-proposed; urgency=low

  * Expand the small gap we leave at the end of the disk to avoid MD
    superblock ambiguity so that it correctly covers the region where
    ambiguity might arise. The previous gap was insufficient on disks that
    were between 512 and 65535 bytes larger than a multiple of 1048576 bytes
    (LP: #569900).
 -- Colin Watson <email address hidden> Mon, 06 Dec 2010 16:12:28 +0000

Changed in partman-base (Ubuntu Lucid):
status: Fix Committed → Fix Released
Christian Brandt (brandtc) wrote :

There are other way to get this bug too and it will haunt us back in the future and not only within Ubuntu

It hit me when partitionising with fdisk -u /dev/sdx but not with fdisk /dev/sdx because the first fits tightly to the end of the drive, the second leaves some space. The same happened when using GPT through cfdisk (not sure, its been a while).

All in all it shouldn't be wrong to use a tightly fitting partition table. The real problem here is that a 0.9 superblock can not be reliably source to either a device or a partition (or a LVM, actually I can imagine scenarios where its not only about drive or partition but in addition about other mappings like LVM, crypt, hey, even a stupidily placed part of a filesystem could qualify as a superblock). Until now it worked because scanning for superblock accidentially used a less error prone sequence for scanning (in fact even then the scanning usually ran head first into a wall but accidentially this didn't get through to the user)

In short: Placing vital information at the end of a bunch of sectors and hoping for a successful poking-around by the startup is stupid and prone for error.

Everyone should use front-aligned superblocks, that is version 1.1 and/or 1.2 because every known mapping (LVM, MD, Crypt, filesystems) are able to preserve lead-in-gaps and deliver this vital information to the next layer. Not so for for lead-out-gaps.

Orticio Jlgtgutisu (jlgutisu3) wrote :
Download full text (4.3 KiB)

Por favor, lenguaje en Español

> Date: Wed, 26 Jan 2011 17:24:36 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 569900] Re: partman sometimes creates partitions such that there is ambiguity between whether the superblock is on the disk device or the partition device
>
> There are other way to get this bug too and it will haunt us back in the
> future and not only within Ubuntu
>
> It hit me when partitionising with fdisk -u /dev/sdx but not with fdisk
> /dev/sdx because the first fits tightly to the end of the drive, the
> second leaves some space. The same happened when using GPT through
> cfdisk (not sure, its been a while).
>
> All in all it shouldn't be wrong to use a tightly fitting partition
> table. The real problem here is that a 0.9 superblock can not be
> reliably source to either a device or a partition (or a LVM, actually I
> can imagine scenarios where its not only about drive or partition but in
> addition about other mappings like LVM, crypt, hey, even a stupidily
> placed part of a filesystem could qualify as a superblock). Until now it
> worked because scanning for superblock accidentially used a less error
> prone sequence for scanning (in fact even then the scanning usually ran
> head first into a wall but accidentially this didn't get through to the
> user)
>
> In short: Placing vital information at the end of a bunch of sectors and
> hoping for a successful poking-around by the startup is stupid and prone
> for error.
>
> Everyone should use front-aligned superblocks, that is version 1.1
> and/or 1.2 because every known mapping (LVM, MD, Crypt, filesystems) are
> able to preserve lead-in-gaps and deliver this vital information to the
> next layer. Not so for for lead-out-gaps.
>
> --
> You received this bug notification because you are subscribed to Ubuntu
> ubuntu-10.04.2.
> https://bugs.launchpad.net/bugs/569900
>
> Title:
> partman sometimes creates partitions such that there is ambiguity
> between whether the superblock is on the disk device or the partition
> device
>
> Status in “grub2” package in Ubuntu:
> Invalid
> Status in “partman-base” package in Ubuntu:
> Fix Released
> Status in “grub2” source package in Lucid:
> Invalid
> Status in “partman-base” source package in Lucid:
> Fix Released
> Status in “grub2” source package in Maverick:
> Invalid
> Status in “partman-base” source package in Maverick:
> Fix Released
>
> Bug description:
> Binary package hint: mdadm
>
> In a KVM, I can do this just fine:
>
> * Using 2 virtual disk images
> * Install Lucid Server amd64
> * Both disks partitioned to just one large Linux raid partition
> * RAID1 these two together, /dev/md0
> * Put / on an ext4 filesystem on /dev/md0
> * Install
>
> The above works.
>
> However, I have spent my entire weekend trying to get 10.04 on a RAID1
> of two 500GB SATA disks, without success.
>
> I partitioned them the same as above. And conducted the install.
>
> When I boot into the new system, I get dropped to an initramfs shell.
>
> I can see that /dev/md0 exists, and is in the process of resyncing.
>
> I try to "mount /dev/md0 /root" an...

Read more...

Colin Watson (cjwatson) wrote :

Christian: We use metadata version 1.1 by default as of Natty. You're right that it's generally a much better design. However, this is not easily backportable, at least not to Lucid, because we only added support for 1.x superblocks to GRUB in Maverick.

tags: added: testcase
astrostl (astrostl) wrote :

I think I ran into this. Have an 11.04 system, set up ways unknown. sdc1 and sdd1 were mirrored via mdadm, and sdd failed. I replaced it with an identical drive, fdisked it, and was told that it was too small when I tried to add it back into the array.

fdisk -l /dev/sdc output:

Disk /dev/sdc: 300.1 GB, 300069052416 bytes
255 heads, 63 sectors/track, 36481 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0008dc33

   Device Boot Start End Blocks Id System
/dev/sdc1 1 36482 293035008 fd Linux raid autodetect

36481 cylinder disk, with a partition ending on non-existent cylinder 36482.

I could get it to start rebuilding by adding the entire /dev/sdd device rather than a partition, but it felt weird pairing a partition with a device. I dug around a bit on Google, and ended up here.

My "solution": sfdisk -d /dev/sdc|sfdisk --force /dev/sdd (using --force because sfdisk doesn't like the way sdc1 looks either).

That got me a clone of sdc's partition layout. It doesn't explain how it got there to start, though, or if I'm skating on thin ice overall.

Phillip Susi (psusi) wrote :

Your issue does not seem to be related to this bug report.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers