raid1 clean and sync'd prior to reboot, after reboot, random bind and unbind resulting in degraded md

Bug #996908 reported by DynamoTester
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux Mint
New
Undecided
Unassigned

Bug Description

I have had some experience running mdadm on Ubuntu since release 10.04.
I use mdadm raid1 to manage 12 mirrored partitions across 4 HDD
(all partitions are mirrored) . I am using metadata 0.9 and allow boot in degraded mode.
Using mdadm - v3.1.4 - 31st August 2010.

Recently, I installed Linux Mint 12 DVD 64-bit and had everything setup properly.
I waited for all md to sync and /proc/mdstat indicated everything was fine.

After rebooting the system, I noticed that a few md were "active with 1 out of 2 mirrors."
So, I sync'd md again and rechecked before rebooting.
After another reboot, then randomly, other partitions started in degraded mode.
Almost every reboot resulted in degraded mode.

I noticed in syslog that after binding both devices, one of them would then be "unbind".

 md: bind<sdc11>
 md: kicking non-fresh sdd11 from array!
 md: unbind<sdd11>
 md: export_rdev(sdd11)
 md/raid1:md211: active with 1 out of 2 mirrors

After much research and probing, I noticed that the forth HDD
that was being mirrored had a slightly different partition table.
It appears that the fdisk utility was modified recently to use sectors instead of cyl.
fdisk (util-linux 2.19.1). Using this tool, I manually created a new partition table for
my forth HDD, using the third HDD as a reference and using cyl units.

Afterwards, comparing partition tables, one table started at sector 2048 and the other at 63.
This caused all partitions to be aligned at slightly different sectors for all partitions.
The third HDD partition was created back with Ubuntu 10.04. The forth HDD partition
was created now using fdisk. The forth HDD was checked for errors and wiped
clean with badblocks before creating partitions and adding it to existing md.

From what I understand, partition size should not matter. The mirror will use
the smallest partition size for raid1.

I experimented and found a solution where I removed all md
references on the fourth HDD, leaving md in degraded mode.
Cleared all metadata on forth HDD and used sfdisk to
create an exact duplicate partition table.

Now the system is stable. I no longer observe unbind and degraded behavior after reboots.
My observation is that somehow mdadm is affected by this different partition alignment.
Let me know if you need further info.

affects: community.linuxmint.com → linuxmint
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.